pdfbox

Apache PDFBox - a Java library for working with PDF documents; handles text extraction, form filling, PDF creation, and rendering; the right choice when Tika is overkill and you only need PDFs
apache, parsing, pdf
tika