I am currently encountering an issue when attempting to upload PDF files via the API endpoint of an application hosted on a server. The API call results in the following exception:
{
"detail": "File path pdf_name.pdf is not a valid file or URL"
}
The files ‘pdf_name.pdf’ is present in the same directory as the application file(app.py)
NOTE: This issue isn’t always encountered with every file upload, sometimes the application works alright.
Background:
-
The application (app.py) is running on a server.
When uploading the required PDF files via the API endpoint, the
specified file path is reported as invalid
.
Local Environment:
-
Interestingly, the same operation works as expected when the
application is hosted locally on my machine using the uvicorn server.Uploading the same PDF files via Postman in the local environment
returns the desired results.
Following is the code of the application:
@app.post("/analyze_text")
async def analyze_text(r: UploadFile = File(...), j: UploadFile = File(...)):
try:
# Extract text from PDFs using langchain
r_text = extract_text_from_pdf(resume.filename)
j_text = extract_text_from_pdf(job_description.filename)
prompt = ChatPromptTemplate.from_template(template=r_template)
messages = prompt.format_messages(r_text=resume_text,format_instructions=format_instructions)
r_response=chat(messages)
output_dict = output_parser.parse(r_response.content)
Following is the extract_text_from_pdf function:
def extract_text_from_pdf(file):
pdf_loader= PDFMinerLoader(path)
data = pdf_loader.load()
text = data[0].page_content # Assuming the first page content is sufficient for this example
return text
You can’t (and should never) use
resume.filename
for anything, that’s the file name the user has provided for the file, and in this case would let the user attempt to load any file on your server. Instead, you can write the file content to a temporary file (UploadFile
is a file like object backed by a spooled temporary file) ifPDFMinerLoader
can’t accept a file like object to read from directly. Do not trust any of the given metadata for the file itself; it’s generally user provided.