Pandas join on string columns failing: ValueError: You are trying to merge on object and int64 columns

I am trying a very simple join on two dataframes: df1 and df2.
I’ve read them in from a csv file, specifying the dtype of the joining column :

df1=df1.read_csv("df1.csv",dtype={"code":str}
df2=df2.read_csv("df2.csv",dtype={"code":str}

Content types are as follows:

df1.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 6 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   local        6 non-null      object
 1   name         6 non-null      object
 2   type         6 non-null      int64 
 3   second_name  6 non-null      object
 4   code         6 non-null      object
 5   item_name    6 non-null      object

df2.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 14 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   item_id        1 non-null      int64  
 1   item_name2     1 non-null      object 
 2   code           1 non-null      object 
 3   size           1 non-null      float64
 4   category_id    1 non-null      object 
 5   quality        1 non-null      object 
 6   quality_id     1 non-null      int64  
 7   brand          1 non-null      object 
 8   brand_subtype  1 non-null      object 
 9   score          1 non-null      int64  
 10  size.1         1 non-null      object 
 11  country        1 non-null      object 
 12  city           1 non-null      object 
 13  level          1 non-null      int64  
dtypes: float64(1), int64(4), object(9)
memory usage: 240.0+ bytes

The actual contents:

df1
  local name  type second_name code item_name
0   yes  bob     1       jenga    1    triple
1   yes  bob     1       jenga    1    triple
2   yes  bob     1       jenga    1    triple
3   yes  bob     1       jenga    1    triple
4   yes  bob     1       jenga    1    triple
5   yes  bob     1       jenga    1    triple

df2
   item_id item_name2 code  size  ... size.1 country      city level
0     4500     triple    1  0.25  ...  small   china  shanghai     3

Just to ensure data types for the key column “code”, I casted the column to string explicitiy:

df1.code = df1.code.astype(str)
df2.code = df2.code.astype(str)

Problem is when I try joining, (either left or right):

df1.join(df2, how='left', on='code')

I get the following error:
ValueError: You are trying to merge on object and int64 columns

As I’ve read the columns code explicitly as string types and also casted them later (rest assured, I get the same problem if I don’t repeat the casting), I don’t see how this is a problem.

I could use pd.merge instead but it doesn’t explain or solve the problem.

Working with python 3.10

Any ideas ?

  • Does this answer your question? What is the difference between join and merge in Pandas?

    – 

merge works on strings but join doesn’t work on strings.

Try:

df1.merge(df2, how='left', on='code')

df1.join(df2) always merges via the index of df2 whereas df1.merge(df2) will merge on the column.

Edit:

Found the reason explained in this answer:

What is the difference between join and merge in Pandas?

I can think of 2 reasons:

  1. Presence of NUll values in the dataset
  2. Characters with spaces

So try this:

df1.dropna(inplace=True)
df2.dropna(inplace=True)
# Remove the white spaces from the code feature
df1['code'] = df1['code'].str.strip()
df2['code'] = df2['code'].str.strip()
# now merge
merged_df = df1.merge(df2.astype({'code': 'str'}), how='left', on='code')

Hope it helps!

Leave a Comment