Text parser script for various files, currently supporting .docx, .doc, and .pdf.
Place all the files that needs to be parsed into a single directory together with this script file.
This script will parse all supported file formats into .txt output and save it into a subdirectory "RAW_RESULTS" that will be automatically generated relative to where this script is ran from.
.doc - antiparse
.docx - doc2txt
.pdf - PyPDF2