GraphCompose 1.6.4: A Fluent PDF Generator

Artem Demcha has released GraphCompose v1.6.4. It is an MIT-licensed, Java-first DSL for generating PDFs from a semantic document model: PDFBox does the actual draw calls, but the engine resolves geometry and pagination first. The release ships clip-path shape containers (addCircle, addEllipse, addContainer produce a ShapeContainerNode whose children are clipped by ClipPolicy.CLIP_PATH by default), per-layer z-index and rotate/scale transforms on every shape-shaped builder, advanced tables (rowSpan, zebra(odd, even), totalRow, repeatHeader()), two new BusinessTheme-driven templates (InvoiceTemplateV2, ProposalTemplateV2), and twenty-two runnable examples.

Template Generation is front and center

CvTheme is at the heart of the examples, and was the primary driver for the library, which its author says was open sourced to serve other developers rather than just to fit a specific need - and it now has business themes, like invoices, proposals, statements of work, and reports. The fact that it can cater to different kinds of outputs speaks well of the design, and the fluent API suggests that nearly anything can be generated, given an AST - it's just translation at execution.

With that said, though, resumes are only the tip of the generation iceberg¹, and PDFs are only one type of output.

The engine versus the templates

The docs scatter the template inputs across separate examples, which makes it hard to see the full data-in, PDF-out path at a glance. The first goal of the developer was plain: just make the engine work. Templates are not meant to lock callers in; they are entry points into an engine that will, if you want, hand you back the keys.

Two design decisions in the engine matter more than any individual template.

Layout and render are separate passes. A layout graph resolves geometry first; rendering consumes the resolved fragments. That separation is what makes the project's deterministic snapshot tests practical (layout state is stable across runs and machines, so visual regression catches design drift before pagination noise does), and it is what lets the engine paginate atomically without manual page accounting. It is also the seam where a format-independent representation can live.

The renderer is isolated behind a single interface. PDFBox is the production backend today, but the source tree already carries a DOCX renderer on top of Apache POI, declared optional so callers opt in by adding poi-ooxml to their build. The public roadmap then lists PPTX and XLSX renderers as the next two items.

We could be looking at a generalizable, and simple, multi-format generator; it's not as flexible as pandoc, but pandoc has to cater to so many specific inputs and outputs that it creates a lossy translation almost by definition. This isn't trying to solve pandoc's problems, and so it doesn't have to solve pandoc's problems.

What that means

GraphCompose looks like a generalized library with some specific useful functionality out of the gate, much like a modular synthesizer that has some default useful patches in it; if you need a CV generator (and who doesn't want one of those these days?), it has a CV generator. If you want an opinionated PDF DSL, you get an opinionated PDF DSL.

But again, the fluent API gives you something more: a semantic document AST with a renderer contract attached. Author the document once in the Java DSL (or create an AST from a set of inputs!), resolve geometry once, and a backend that knows how to consume the resolved fragments can target whatever output format it understands.

That is what most existing Java options do not give you: iText and PDFBox draw PDFs, Apache POI authors Office formats (... sort of) but does not share any structural model with the PDF stack, so a document targeting both backends has to be authored twice and users need to anticipate format-specific inconsistencies, since the generation isn't shared. GraphCompose is asserting that the document model is the durable artifact and the renderer is the swappable part. The DOCX path already living in the source tree is a proof; PPTX and XLSX would extend this design rather than rebuild it.

What to watch

The DOCX renderer in the source tree is a proof of concept; PPTX and XLSX are the test of whether the layout-render separation actually pays for itself across genuinely different output models. PDF and DOCX both want flowing text on pages. XLSX is a grid. PPTX is fixed-position slides. If a single semantic model resolves cleanly to all four, the design holds. If it doesn't, the renderer interface is going to gain seams, and the seams will tell you where the abstraction was leaking. Either result is informative.

The other thing to watch is whether the AST becomes a public contract. Right now the renderer interface is internal, which means GraphCompose owns both ends and can change either without breaking anyone. The interesting future is one where the document model is documented and stable enough that someone outside the project can write a renderer against it without forking. That's the difference between a useful library and an actual standard, and it's a choice the project will have to make at some point.

It really is an iceberg. Most of it's hidden, you know it's lurking out there, and it's going to sink everything that gets in contact with it. There's a reason people "just use Markdown" - or Word - in abject surrender.
↩

Template Generation is front and center

The engine versus the templates

What that means

What to watch

Comments (0)