I am working on handling files on the backend. I am trying to send a PDF to the backend, and then I want to parse the PDF so that I can read its text. It seems that I am able to send the PDF over to the backend. However, I don’t know how to read the PDF text after I get it on the backend. Here is my post request:
app.post("/submitPDF", (request, response) => {
console.log("Made a post request>>>", request.body);
// if (!request.files && !request.files.pdfFile) {
// console.log("No file received");
// response.status(400);
// response.end();
// }
pdfParse(request.body.pdfFile).then((result) => {
console.log(result.text);
});
response.status(201).send({ message: "File upload successful" });
});
Here is my API POST request just to show how I sent the PDF. I created a FormData
object, appended my PDF, and then sent it in my post request:
export const fetchPDF = (value) => {
console.log("The value>>>", value);
const formData = new FormData();
formData.append('pdfFile', value);
console.log(Object.fromEntries(formData.entries())) // this is how to console log items in FormData
return fetch(`${baseURL}/submitPDF`, {
method: 'POST',
headers: {
'Content-Type': 'multipart/form-data', // had to change content-type to accept pdfs. this fixed the cors error
},
body: formData
})
.then((response) => {
if (response.ok) {
console.log("The response is ok");
return response;
} else {
// If not successful, handle the error
console.log("the response is not ok", response);
throw new Error(`Error: ${response.status} - ${response.statusText}`);
}
})
.catch((error) => {
console.log("There is an error>>>", error.message);
})
}
When I console log the request.body
, which contains the PDF, I get some buffer object like this:
Made a post request>>> <Buffer 2d 2d 2d 2d 2d 2d 57 65 62 4b 69 74 46 6f >72 6d 42 6f 75 6e 64 61 72 79 35 57 69 37 45 36 4f 31 49 36 37 45 6f 53 42 >32 0d 0a 43 6f 6e 74 65 6e 74 2d … 120 more bytes>
I tried to parse my PDF using pdf-parse
like this:
pdfParse(request.body.pdfFile).then((result) => {
console.log(result.text);
});
But I get these 2 errors:
throw new Error(‘Invalid parameter in getDocument, ‘ + ‘need either >Uint8Array, string or a parameter object’);
Error: Invalid parameter in getDocument, need either Uint8Array, string or a parameter object
It seems I have to parse the buffer object, but I’m not sure how I exactly do that? Would I have to convert the buffer object into string? If so, how do I do that? And then I use pdf-parse
afterwards so I can read the PDF’s text?
You need some middleware to upload the file.
Multer is recommended for example:
https://github.com/expressjs/multer
Then update your code to something like:
const express = require('express')
const multer = require('multer')
const upload = multer({ dest: 'uploads/' })
const app = express()
app.post('/profile', upload.single('pdf'), function (req, res, next) {
// req.file is the `pdf` file
// req.body will hold the text fields, if there were any
pdfParse(req.body.pdf).then((result)...
})
had to change content-type to accept pdfs. this fixed the cors error – content type has not much to do with CORS, but this may be one thing that breaks your process, the buffer starts with
------WebKitFormBoundary5Wi7E6O1I67EoSB2
(use some online converter, like rapidtables.com/convert/number/hex-to-ascii.html), and that’s not what a pdf decoder expects for sure.