pyspark.errors.exceptions.captured.analysisexception: org.apache.hadoop.hive.metastor.api InvalidObjectException : Unknown operator “!=”

Using AWS-EMR 6.12.0, pyspark3.4.0 and python3.8

Longer error message:

    pyspark.errors.exceptions.captured.analysisexception: 
org.apache.hadoop.hive.metastor.api InvalidObjectException 
    : Unknown operator "!=" (Service:= AWSGlue: Status Code : 400)" 

when happens when running df.write.mode("append").format("hive").partitionBy(*list_of_cols).saveAsTable("<database>.<table>)

Guess is that operation that is done to df contains ‘!=’ operation such as

df_b = df_z.groupBy(['key']).agg(F.count(F.when(F.col("col_a") != 1, F.col("col_b")).otherwise(F.lit(0)))
df = df_a.join(df_b, ['key'], 'left')

What are some ways I can replace ‘!=’ ?

  • There is nothing wrong with !=, but you have a missing closing parentheses here: df_b = df_z.groupBy(['key']).agg(F.count(F.when(F.col("col_a") != 1, F.col("col_b")).otherwise(F.lit(0))))

    – 

Leave a Comment