Developer Experience is a Performance Feature

Christian Rackerseder recently published Developer Experience is a Performance Feature: the friction your team feels day to day shows up, eventually, in the product your users feel.

The argument here is that tooling is important, but a core aspect of delivery is something else: confidence. Engineers who can move with confidence ship better systems. Engineers who can't, don't. The languages and frameworks can change, and will - but confidence doesn't and shouldn't change.

Developer experience tends to yield performance not because comfortable engineers write faster code, but because comfortable engineers can see their code. Testability in particular is less a quality discipline than an observability discipline. A system you can put under test is a system whose behavior you can look at, in isolation, with numbers attached.

The reason this matters: most performance problems are not optimization problems. They are discovery problems. Someone may think something is slow, but they don't know - a process takes twenty minutes to run, and that's a long time, and so nobody wonders what part of that twenty minutes is actually wrong. Or maybe it's fast, but they don't have any expectations about what "fast" means in context, so the 35ms it takes to run is fine.

Testability and the developer experience factor in dramatically here. Testability means you can run bits of that twenty-minute process and see that four of those minutes are hung up in a single step... and you can ask if that's mandatory.

That's happened to me; I was tasked with replacing a component in a workflow, where one option was fast but chewed through memory like nobody's business (our customers noticed!), and the other option was low-memory but "had a flaw." The designs of both components were faintly brittle, so I wrote a full replacement; how would I know that it worked?

I ended up building an observable test. The first option, tightly aligned with the dataset, really was expensive: a test over a representative stressful dataset took roughly fifteen seconds to run, and went through twenty full garbage collections in the process. My replacement ran in about thirty seconds, but had hardly any ties to the dataset, and ran about twenty light garbage collection runs - still a lot of garbage collected, but it was all short-term memory access, which meant it spent far more time in the data and less time stressing memory; the overall memory consumption was dramatically lower.

The "production" option - the "low-memory but has a flaw" option, usually chosen because it didn't have the excessive memory usage of the alternative - really did have a flaw. It really was low-memory, running one GC over its run, but over the stressor dataset it took, well, over four minutes.

It was difficult to observe, so nobody saw. Once it was observed, we traced the problem to a flaw in a recursive structure, fixed it (in about half an hour!), and the "flawed process" now ran in about twenty seconds... still with no special memory consumption patterns.

That's a win, all around, especially because this workflow is core to the product. In small datasets, it didn't matter that what should have taken one second took ten. In big datasets, it's the difference between a failure and success.

We ended up throwing out the "replacement code." It's still got potential benefits, in that it's more generic than the deployed code, but it's fixing a problem that we no longer have. Again, that's a win.

"Deployed and working" and "understood with confidence" are not the same things, at all, and that's the heart of the point of Mr. Rackerseder's post.

His claim is that developer experience is a performance feature.
The stronger version of his claim, the one his argument leans toward without quite stating, is that developer experience is an epistemic feature.
Good performance is just one of the results when your team can finally see what it has been shipping.

Comments (0)