Roseau Catches Breaking Changes Before Your Clients Do

A research team out of LaBRI in Bordeaux has released version 0.6.0 of Roseau, an open-source tool that detects breaking API changes between two versions of a Java library. It occupies the same territory as japicmp and Revapi, with one main structural difference: where those tools compare compiled JARs, Roseau can build its API models from source code directly, which means it can compare a released artifact against the current source tree of a pull request or a local branch.

The tool distinguishes binary-breaking from source-breaking changes, reports in HTML, Markdown, CSV, and JSON, and ships a Maven plugin that runs in the verify phase and fails the build when the current code breaks compatibility against a chosen baseline.

The project is backed by a peer-reviewed paper at ICSME 2025, and the authors report an overall F1 score of 0.94 for distinguishing source- and binary-breaking changes against their benchmark, along with the ability to analyze the JDK's roughly four million lines in nineteen seconds. (Woof.) Those numbers come from the authors' own evaluation, so weigh them as a vendor's benchmarks rather than settled fact; the more persuasive credential is adoption, and JUnit has already wired Roseau into its build as a backward-compatibility check.

The natural first reaction to a tool like this is polite dismissal: "Why do I need this? When I update a library, if it's incompatible, my code fails to compile and my most-excellent test suite fails." Updating a library is, after all, normally a deliberate act. A developer bumps a version on purpose, and if the new version breaks something, everything breaks - compilation, tests, whatever - and the changelog gets read. The problem Roseau solves appears to be a problem the toolchain already solves naturally, for free, as a side effect of doing the work. Why install a tool to detect what the build already detects?

That reasoning is airtight, and it is airtight from exactly one chair: the consumer's. Every claim in it is about what happens when someone uses a library, and it carries with it the assumption that tests are, in fact, good enough to catch incompatibilities once code compiles. And because consuming libraries is something developers do every day while publishing them is something done occasionally, the consumer's position is the one everyone evaluates from by default, including developers who ship libraries themselves.

The producer's circumstance is different. A library maintainer's build always compiles against the maintainer's own current code, which means the maintainer's compiler is structurally incapable of answering the only compatibility question that counts: will the clients break? There is no compilation error for code that lives in someone else's repository. The maintainer's tests pass, the maintainer's build is green, and the breaking change ships anyway, which is precisely why the consumer ends up reading a changelog in the first place. Roseau, in the verify phase, compiled against a released baseline, is the closest thing a producer gets to a compiler for clients that do not exist in the build. That is what JUnit is doing with it: not checking JUnit's dependencies, but checking JUnit against JUnit's own past, on every change, before anything ships.

The consumer's reasoning has a second hole, too, if you ignore the issue of how complete tests might be: "the compiler and tests will catch it" covers source-breaking changes in code that gets recompiled. It says nothing about binary compatibility, and the distinction is right there in Roseau's sample output: a field changed to static is flagged as binary-breaking but source-compatible. Code that recompiles against the new version is fine. Code that was already compiled against the old version, which is to say every transitive dependency in the tree, throws at runtime instead, on whatever code path the test suite happens not to exercise. The compiler is silent by definition; the tests catch it using hopes and dreams, under normal circumstances. And the inverse case exists too: a new abstract method on an interface is binary-compatible but source-breaking, detonating only for the subset of clients who happen to implement the type. A green build samples the compatibility space. It does not cover it.

And the place where this bites hardest is not Guava or JUnit, which have professional maintainers, compatibility policies, and changelogs that get read. It is the internal platform library, the shared artifact some team publishes to a local Nexus installation, consumed by a dozen services, maintained as a part-time concern, and documented by a commit log nobody reads. Open-source libraries break APIs in public, with deprecation cycles and migration guides. Internal libraries break them silently, at 2 PM on a Tuesday, and the first detection mechanism is a NoSuchMethodError in someone else's deployment. Every team that publishes an artifact other teams compile against is a library producer, whether or not the org chart says so, and almost none of them run anything in this category.

Which raises the question this category always raises: japicmp and Revapi have existed for years, they are good tools, and adoption outside a handful of disciplined open-source projects rounds to nothing. Roseau's case suggests the mechanism behind that blindness: the evaluation happens from the wrong chair.

A developer hears "breaking change detection," runs the scenario from the consumer's chair, correctly concludes the toolchain already handles it, and files the tool away, never noticing that the scenario was the wrong one and that the producer's chair, the one they also sit in, has no native answer at all. The dismissal is not careless. It is rigorous reasoning applied to the wrong question, which is the most durable kind of wrong there is.

Roseau may or may not displace japicmp; the accuracy claims await independent verification, and source-tree analysis is a genuine differentiator for PR-level checks either way. But the tool is less interesting than the test it poses. Anyone who ships a jar that other people compile against has a compatibility contract, enforced today by nothing but care and memory. The tooling to enforce it structurally has existed for a decade or more and now has an academically rigorous new entrant that fails builds on violation. Whether it gets used comes down to whether producers can be convinced to evaluate it from where they are in the chain, and the evidence suggests that that is the hard part.

Comments (0)