Two Articles on JSON Query Optimization: a tool and an approach

Most developers treat JSON querying as free. It isn't, and two recent articles make the case from opposite directions. One used a regular language and an optimal traversal mechanism - the other used AI to rewrite data access and transformation in Go to remove RPC calls.

jsongrep: Compile the Query, Not the Interpretation

Micah Kepe's jsongrep attacks the problem at the algorithm level. The core insight: a JSON path query is a regular language - it describes paths through a tree using a grammar with no ambiguity, no recursive lookahead, no edge cases that blow up on unusual input.

That matters because regular languages can be compiled into a deterministic finite automaton. A DFA processes input with O(1) work per symbol -no backtracking, no interpretation at runtime, and crucially, no surprises. Tools like jq and jmespath interpret path expressions as they traverse the document; jsongrep compiles the query first, then walks the document exactly once, pruning entire subtrees in O(1) when the DFA says they can't match.

The formal constraint is also a feature: by keeping the query language strictly regular, jsongrep guarantees predictable performance regardless of query complexity. You give up expressiveness - no filters, no arithmetic - and get a tool you can reason about.

On a 190MB dataset, the end-to-end benchmark isn't close.

Kepe is upfront about the tradeoffs: jsongrep is a search tool, not a transformation tool, just really fast path matching. The article also walks through Glushkov's construction and subset construction in practical detail, which is worth the read on its own as a comparison to Ken Thompson's NFA construction.

gnata: Kill the Language Boundary

Nir Barak's rewrite of JSONata in Go for Reco's pipeline is a different problem solved at the architecture level.

Reco's policy engine evaluated JSONata expressions against billions of events. The reference implementation is JavaScript; the pipeline is Go. Every evaluation crossed a language boundary via RPC - roughly 150 microseconds of overhead before any actual work happened.

Rear Admiral Grace Hopper has a famous presentation about how time adds up. 150 microseconds isn't much, but when you have a lot of 150 microsecond traversals...

Their solution was gnata: a pure-Go JSONata 2.x implementation with a two-tier evaluator. Simple expressions take a fast path that operates on raw bytes without parsing the document. Complex expressions go through a full parser, and the RPC fleet is gone.

Their claim: the total build time was one day, with a token cost of $400, saving them potentially half a million dollars a year.

The article is also honest about what came after: building gnata was day one; shadow-mode validation against production traffic was the rest of the week. But there's more; they also caught bugs in the reference JSONata implementation along the way - wins on multiple fronts.

The Common Thread

JSON querying at scale rewards investment that most teams defer until it's already expensive. That's a pattern that goes beyond JSON, too.

jsongrep is interesting as a computer science artifact - the DFA approach is elegant and the benchmarks back it up. gnata is interesting as a production story - the methodology (port the test suite, implement until it passes, validate in shadow mode) is directly reusable for any language-boundary problem in your own pipeline.

Both are worth reading.

Comments (0)

Sign in to comment

No comments yet.