SUM function in Pandas dataframe gives a different total compared to summation in Excel spreadsheet [closed]

Question 1

I am getting different results when I run the sum() function in pandas compared to the totals which show up in Excel. I am pivoting the dataframe in pandas to get different views of the data and check if theres a difference between the totals in pandas and excel.

The complete dataframe (‘merged_data’) is available as a .csv file here: [https://github.com/yyshastri/NZ-Police-Community-Dataset/blob/7562ae2d9b8c4337d9ad8bd436f5d355826820b0/Merged_Community_Police_Data_v2.xlsx][1]

The code is as below:

 # select Auckland and the far North district
TA_data = merged_data[merged_data['TA2018_name'].isin(['Auckland','Far North District'])].copy()
dfp2 = TA_data.pivot_table(index='Year', columns="crime", values="Reported Incidents", aggfunc="sum")
dfp2.head()

This is the result of the above code:

crime	Abduction	Assault	Blackmail	Illegal Use of Motor Vehicle	Illegal Use of Property	Robbery	Sexual Assault	Theft (Other)	Theft From Retail Premises	Theft From a Person	Theft from a Motor Vehicle	Theft of a Motor Vehicle	Unlawful Entry, B&E
Year
2018	96	6095	56	10548	491	1202	541	12831	8670	573	10654	52	19873
2019	92	7042	85	11014	703	1213	596	15893	10012	670	12665	42	22910
2020	67	7135	90	9335	493	961	661	15182	11517	556	8883	30	18791

There is a variance of anywhere between 6 to 200+ between the total from pandas vis-a-vis Excel. For example, the category of “Assault” for 2020 gives 7135 in the df but in Excel it shows 7723.
I might be missing the obvious but would appreciate any pointers.
[1]: https://github.com/yyshastri/NZ-Police-Community-Dataset/blob/7562ae2d9b8c4337d9ad8bd436f5d355826820b0/Merged_Community_Police_Data_v2.xlsx

Question 2

In this case it wouldve been helpful if you would have also given us the information how you found the Excel Sum.

It turns out that your dfp2 data is right but your formula for the Excel search mustve been wrong.

For Assault 2020 in Auckland/Far North District i get 7135 with your python script and 7135 in Excel.

The used Excel formula looks like this:

=SUMIFS( C:C; B:B; "Assault";J:J;2020;H:H;{"Auckland";"Far North District"})

output which equals to 7135.

Here is also another way in python to find 7135 incidents without pivoting the table:

merged_data["Reported Incidents"][(merged_data["crime"]=="Assault")&(merged_data["Reported Incidents"]>=1)& (merged_data["Year"]==2020)&(merged_data["TA2018_name"].isin(["Auckland","Far North District"]))].sum()
#output
7135

Leave a Comment Cancel reply