Comprehension Debt: the Cost of Relying on AI Code

Addy Osmani put out an essay called "Comprehension Debt - the hidden cost of AI generated code." It says that "Comprehension debt is the growing gap between how much code exists in your system and how much of it any human being genuinely understands," a statement that definitely seems to resonate.

The problem is that AI can indeed generate code incredibly quickly - but it's more quickly than humans can integrate or understand, or evaluate. The result is a codebase that "works" - code created with "just make the tests pass, please" - but that humans don't understand, and cannot or do not review. It's code that's "correct" by definition only, and that has a huge downstream cost for entire organizations.

Osmani cites a recent Anthropic study that reported that AI assistance created a 17% comprehension gap between coders who used AI and those who didn't, with gaps in conceptual understanding and especially debugging capability. "The researchers emphasize that passive delegation (“just make it work”) impairs skill development far more than active, question-driven use of AI."

Comprehension is the job, according to Osmani. Developers who rely on AI for implementation and not design, who set the controls for the heart of the sun and go to sleep, are actively creating comprehension gaps - the system "runs," but ... does it, really? And if the system was created with implementations that nobody actively understands, changes are harder and more expensive - and possibly more dangerous, when AI code is used in mission-critical environments or in systems that impact healthcare.

The advice for readers is clear: don't necessarily avoid AI, but use AI well. Direct carefully, understand, review, work in small changes, test based on your requirements and demand the tests run based on your requirements - some LLMs will remove test conditions altogether as being barriers to success. It might be that test review is the most important thing in environments that use AI.

It's almost a joke, except it's not very funny: "Make it say Hello, world" - with the LLM saying "'World' is problematic, I'll just make sure it outputs something." The test "passes" but would not actually succeed based on what it should do.

Comprehension is a core part of our field, no matter what our role in it, and no tool changes that. If there's any advice here, it's to take care.

Comments (0)