SUM function in Pandas dataframe gives a different total compared to summation in Excel spreadsheet [closed]

I am getting different results when I run the sum() function in pandas compared to the totals which show up in Excel. I am pivoting the dataframe in pandas to get different views of the data and check if theres a difference between the totals in pandas and excel.

The complete dataframe (‘merged_data’) is available as a .csv file here: [https://github.com/yyshastri/NZ-Police-Community-Dataset/blob/7562ae2d9b8c4337d9ad8bd436f5d355826820b0/Merged_Community_Police_Data_v2.xlsx][1]

The code is as below:

 # select Auckland and the far North district
TA_data = merged_data[merged_data['TA2018_name'].isin(['Auckland','Far North District'])].copy()
dfp2 = TA_data.pivot_table(index='Year', columns="crime", values="Reported Incidents", aggfunc="sum")
dfp2.head()

This is the result of the above code:

crime Abduction Assault Blackmail Illegal Use of Motor Vehicle Illegal Use of Property Robbery Sexual Assault Theft (Other) Theft From Retail Premises Theft From a Person Theft from a Motor Vehicle Theft of a Motor Vehicle Unlawful Entry, B&E
Year
2018 96 6095 56 10548 491 1202 541 12831 8670 573 10654 52 19873
2019 92 7042 85 11014 703 1213 596 15893 10012 670 12665 42 22910
2020 67 7135 90 9335 493 961 661 15182 11517 556 8883 30 18791

There is a variance of anywhere between 6 to 200+ between the total from pandas vis-a-vis Excel. For example, the category of “Assault” for 2020 gives 7135 in the df but in Excel it shows 7723.
I might be missing the obvious but would appreciate any pointers.
[1]: https://github.com/yyshastri/NZ-Police-Community-Dataset/blob/7562ae2d9b8c4337d9ad8bd436f5d355826820b0/Merged_Community_Police_Data_v2.xlsx

  • How was this Excel file generated? From pandas? Please show code for minimal reproducible example.

    – 

  • Thanks for responding but the problem is sorted. Please see my comment to the accepted answer. Was a silly mistake on my part. regards.

    – 

In this case it wouldve been helpful if you would have also given us the information how you found the Excel Sum.

It turns out that your dfp2 data is right but your formula for the Excel search mustve been wrong.

For Assault 2020 in Auckland/Far North District i get 7135 with your python script and 7135 in Excel.

The used Excel formula looks like this:

=SUMIFS( C:C; B:B; "Assault";J:J;2020;H:H;{"Auckland";"Far North District"})

output enter image description herewhich equals to 7135.

Here is also another way in python to find 7135 incidents without pivoting the table:

merged_data["Reported Incidents"][(merged_data["crime"]=="Assault")&(merged_data["Reported Incidents"]>=1)& (merged_data["Year"]==2020)&(merged_data["TA2018_name"].isin(["Auckland","Far North District"]))].sum()
#output
7135

Leave a Comment