Using AWS-EMR 6.12.0, pyspark3.4.0 and python3.8
Longer error message:
pyspark.errors.exceptions.captured.analysisexception:
org.apache.hadoop.hive.metastor.api InvalidObjectException
: Unknown operator "!=" (Service:= AWSGlue: Status Code : 400)"
when happens when running df.write.mode("append").format("hive").partitionBy(*list_of_cols).saveAsTable("<database>.<table>)
Guess is that operation that is done to df
contains ‘!=’ operation such as
df_b = df_z.groupBy(['key']).agg(F.count(F.when(F.col("col_a") != 1, F.col("col_b")).otherwise(F.lit(0)))
df = df_a.join(df_b, ['key'], 'left')
What are some ways I can replace ‘!=’ ?
There is nothing wrong with
!=
, but you have a missing closing parentheses here:df_b = df_z.groupBy(['key']).agg(F.count(F.when(F.col("col_a") != 1, F.col("col_b")).otherwise(F.lit(0))))