How to get results from function that will query database, and take values from rows as arguments

Question

I have a dataframe and I want to use values from rows to execute a query (on delta lake) and get results in a new column. However in Synapse notebook I always get an error:

It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063

I have below function:

def execute_sql(database, table):
  query = f"SELECT COUNT(*) as count FROM {database}.{table}"
  result = spark.sql(query).collect()[0][0]
  return result

and I am applying it as such (there are many more columns, I am just using two, but want to keep others):

execute_sql=udf(execute_sql,StringType())
new_df=input_df.withColumn('TotalCount',execute_sql(col("Database"), col("Table")))
display(new_df)

I am trying to avoid complicating and iterate through dataframe with rdd.collect

Leave a Comment Cancel reply