Converting PDF bank statements to Xlsx/CSV file ❤️ {UPDATED 2023}

I’m trying to use tabula and Pandas concat in Python to PDF bank statements into a CSV/Xlsx file, so I can automate the task of manually entering them into Excel, however after intense experimentation and testing, it refuses extract more than 1 table (only their column headers).

All the tables in the PDF have the same columns (Date, Balance, etc.), where I just want to stack them vertically and sort them by date.

Here’s how I’m trying to extract and concatenate the tables from the PDF as efficiently as possible:

df = pd.concat(tabula.read_pdf(filename, pages="all", multiple_tables=True))

tabula.convert_into(filename, "Converted Document.csv", output_format="csv", pages="all")

The columns for the other months are being shown as expected, however the transactions are missing for all months except the first.

I’ve looked at other solutions here, but I haven’t found a way to get all transactions showing in the converted Excel file.

A Minimal, Reproducible Example (MRE) would be useful here.

–
The 2 lines of code I’ve shown are the only ones I’m using to extract and convert from PDF to Excel. I’m not sure what else to do

–
Do you have an example PDF that can be used to demonstrate the issue?

–

Leave a Comment Cancel reply