When uploading the generated PDF file ChatGPT fails to parse with the following response:
"It appears that the text extraction from the PDF didn't yield any readable content. This could be due to various reasons, such as the text being embedded as images rather than selectable text, or the PDF having some form of encryption or complex formatting that interferes with text extraction."
ChatGPT provided the following code snippet which it likely uses to parse the PDF:
from PyPDF2 import PdfFileReader
import os
# Define the path to the uploaded PDF file
pdf_path = '/mnt/data/resume.pdf'
# Initialize a PDF file reader object
pdf_reader = PdfFileReader(open(pdf_path, 'rb'))
# Initialize a variable to hold the extracted text
extracted_text = ''
# Loop through each page in the PDF file and extract the text
for page_num in range(pdf_reader.getNumPages()):
page = pdf_reader.getPage(page_num)
extracted_text += page.extractText()
# Show the first 500 characters of the extracted text to give a sense of its contents
extracted_text[:500]
UserWarning: Page.extractText is deprecated and will be removed in PyPDF2 2.0.0. Use Page.extract_text instead. [_page.py:1003]