Dragonfly - a multithreaded Redis competitor

DragonflyDb has been advertising relatively heavily on Reddit lately, and as far as ad campaigns go, it's pretty effective: it got me to wonder what the fuss was, and if it was deserved.

Dragonfly is a from-the-ground-up reimplementation of the Redis wire protocol, written by Roman Gershman and a team with serious data-infrastructure credentials. The transactional layer is built on the VLL lock-manager paper¹ from main-memory database research, and the core hashtable design adapts the Dash paper from the persistent-memory world. The architecture is shared-nothing across cores, with each shard owned by exactly one thread, which means atomic multi-key operations don't require mutexes.

On large multi-core boxes, that scales vertically in ways that Redis' design does not, and Dragonfly also focuses heavily on snapshot optimization - no fork-and-copy-on-write memory spike under a write-heavy load.

If Redis is hitting a vertical-scaling ceiling and running a cluster is undesirable, Dragonfly might be entirely credible.

However, the headlines are splashy but the data wants analysis. They claim "25x faster than Redis" but that comes from a specific configuration - a single Dragonfly instance against a single-process Redis on a high-end multi-core system. Redis came back with their own benchmark: on a 40-shard Redis cluster against Dragonfly on the same hardware, Redis came out 18-40% ahead.

Both benchmarks are honest, but benchmarks tend to support the authors who wrote them: they're often chosen to flatter and market. ("We're the best version of us there is!") Reality is that your numbers depend heavily on your actual hardware, workload mix, cluster availability, and your specific requirements, including read/write ratios and persistence requirements.

The license for Dragonfly is BSL 1.1² - source-available rather than OSI open source, permissive for self-hosting and embedding, but restrictive on Dragonfly as a managed service. (This is likely in response to vendors like AWS leveraging open source as managed resources without remuneration for the actual authors: imagine writing the best ThingDoer ever and having AWS provide ThingDoer as a paid service on AWS without you seeing anything from it.)

However, this is a governance issue: Dragonfly is one company whereas Valkey - the open-source fork of Redis - has Linux Foundation stewardship with AWS, Google, Oracle, Ericsson, and Snap behind it. If you're betting the farm on a product and what it does, this matters.

The ads, though: they all pose the same implicit question: Is your cache fast enough? And the answer they have is "No, but ours would be." That's good, if they can back it up, and there's no reason to suspect they cannot - but it's worth asking whether a cache should be in your architecture at all.

Caches can be horribly useful, but used on live data, they can be a code smell: cache logic infects application code, even hidden behind cacheloaders or annotations that provide caching branches. The patterns exist because systems of record are too slow, and caches become a bandaid rather than a cure - when the answer is to use a faster database.

The options for faster databases have gotten much nicer than they were: it used to be "MySQL" was the "fast database" and you endured its featureset as a tradeoff for speed, or you went to GigaSpaces or Coherence (which served as systems of record with databases as secondary stores, with writes going to the datagrid and being reflected in a database for long-term storage.)

Now, PostgreSQL is much faster than it was, with a vastly expanded feature set, and read replicas handle workloads that used to demand caches. It doesn't stop there: ScyllaDB, FoundationDB, CockroachDB, ClickHouse... and the list goes on and could go on for what feels like pages, addressing an incredible variety of access patterns. The IMDG model that Coherence and Gigaspaces exemplified (and still represent well) has diffused into the mainstream rather than existing as a separate product category.

Dragonfly's marketing itself is sliding away from "cache" and into more general territory: the landing page describes the product as in-memory infrastructure for caching - and feature stores, job queues, and more. Feature stores and job queues aren't cache use cases. They're system-of-record use cases for data that needs low latency access. It's converging on the IMDG shape from the cache side of things, which suggests that "cache" sells better than "in-memory data grid."

That's the real interest, perhaps: Dragonfly is asking users "which cache do you want" while providing something that presents familiar cache semantics while suggesting that it's not actually a cache, but a data grid, while using the canard about caches that "the database can't keep up."

That's not an argument for or against Dragonfly: it's an argument that suggests that it's still impossible to replace understanding your own database access patterns, and that we should be letting those access patterns dictate our choices, rather than presuming that the only tool available is pen and paper.

This might be paywalled: a freely available version is here.
↩
This is MariaDB's BSL; Dragonfly's actual LICENSE is also available in their source repository.
↩

Comments (0)