tika
- Text
- Apache Tika - a content detection and extraction toolkit that parses metadata and text from over 1400 file types through a single unified interface; written in Java but accessible via REST server and CLI from any language. Doesn't support GPS or bigtiff formats yet.
- Tags
- apache, parsing, content-extraction