RSONPath with a Touch of DOM

Master Thesis | M.Sc. Informatics at TUM

In my master thesis written in Rust, I conducted research to optimize the already fast query engine rsonpath for JSON files. By introducing elements of DOM (Document Object Model) to it, I achieved faster query performance - at the cost of only a small amount of memory and preprocessing time (paid once at startup) for a newly designed data structure I call the lookup table (LUT).

💻 Download

Download the version of the thesis:
📄 View on Google Drive

Author: Ricardo Kraft
Supervisor: Prof. Dr. Jana Giceva
Advisor: M. Sc. Mateusz Gienieczko
Institution: Technical University of Munich
Chair: Chair for Database Systems

📘 Abstract

Efficiently querying large JSON files is a significant challenge due to the trade-offs between preprocessing time, memory usage, and query speed. In this thesis, we extend the streaming JSON query engine rsonpath with a lookup table (LUT) to improve skipping performance. We explain the development and design considerations behind the LUT, and evaluate its impact across different JSON datasets and query types, comparing it to the original rsonpath and a tree-based approach using serde. Our results show that rsonpath-lut achieves notable speedups for queries that skip large portions of the data, particularly when the number of query repetitions is between 10 and 100. However, in cases without significant skipping opportunities, the LUT introduces some additional overhead. We conclude that rsonpath-lut offers a useful trade-off and propose criteria for choosing between rsonpath modes based on query and data characteristics. We believe that this modification could be incorporated as a feature into the rsonpath project, dedicated to improving the discussed query pattern.

📊 Example

Editor and Game Windows

The image above compares query speeds (repeated 0–100 times) across four algorithms, for a 1 GB JSON file and the query: $.products[2].shipping[*]

  • serde (blue): Starts at 12.5 s since the DOM must be built once for the first query - consuming about 2–8× more space than the original JSON. Subsequent queries are fast due to the pre-built structure.
  • rsonpath (red): No building step or extra space, making it lightweight but slower for repeated queries.
  • rsonpath-lut (orange & green): Adds minimal preprocessing (a few MB and milliseconds) to significantly accelerate repeated queries, staying competitive with SERDE until around 245 repetitions.

🧩 Repositories

Aside from many sources, there are four repositories associated with this thesis. Three mainly edited or created directly by me and the original rsonpath project. The core repository, rsonpath-lut implements the LUT and other optimizations along with all related benchmarks and analysis scripts.

Repository Description
rsonpath-lut My extended version of rsonpath implementing the LUT optimization.
rsonpath-original-measure An unmodified version of rsonpath that we used as comparison.
rsonpath-plotting Python-based visualization and analysis scripts for thesis experiments.
rsonpath Original streaming JSON query engine.
Share: X (Twitter) Facebook LinkedIn