A simple Java utility that loads a PDF file from a URL extracts the text content for insert into a RDBMS or inverted index text search engine. Intended to be used standalone via the command line or integrated with an asynchronous task queue like Celery.
daredevil82 / pdfparser Goto Github PK
View Code? Open in Web Editor NEWJava utility to extract text content from PDF documents and insert into persistent storage