Python ZoneInfo not working in pyspark UDF

Hi Experts I am stuck with an issue where if I use the ZoneInfo from python zoneinfo package the dates are getting converted as expected, but, when I am using the same code under a spark udf it is throwing error as “No timezone found with key Europe/Brussels”. Please help me out here.
Below is the code :

Working python code :

from zoneinfo import ZoneInfo
conv_dt = datetime.strptime('2038.01.09 00:00:00', '%Y.%m.%d %H%M%S').astimezone(ZoneInfo('Europe/Brussels'))

Same code under SPARK UDF NOT working
dataframe creation :

sample_df = spark.createDataFrame(['2038.01.09 00:00:00'], StringType()).toDF('sampledate')

udf declaration :

def test_udf(sd):
    return datetime.strptime(sd, '%Y.%m.%d %H%M%S').astimezone(ZoneInfo('Europe/Brussels'))

calling udf :

x = udf(test_udf, TimeStamp())
cast_df = sample_df.withColumns('sampledate',x(sample_df['sampledate']))
cast_df.show()

Error :
File “…lib/python3.9/zoneinfo/_common.py” ,line 24 in load_tzdata
raise

ZoneInfoNotFoundError:”No timezone found with key Europe/Brussels”

Thanks !

The issue you are facing is likely related to the way Spark UDFs work with external Python packages like zoneinfo. When you use zoneinfo in your Spark UDF, it needs to be available to the Spark workers. For troubleshooting, here are some steps to resolve the issue:

  1. Ensure that the zoneinfo package is installed on all Spark worker nodes. You can use a tool like pip or a package manager to install it. You may need administrative access to install packages on worker nodes.

  2. Make sure that the zoneinfo package is importable within your Spark UDF. To do this, you can include the import statement for ZoneInfo at the beginning of your UDF function.

You can do this by as follows:

from pyspark.sql import SparkSession

from pyspark.sql.functions import udf

from pyspark.sql.types import StringType, TimestampType

from zoneinfo import ZoneInfo

from datetime import datetime

spark = SparkSession.builder.appName(“ZoneInfoExample”).getOrCreate()

def test_udf(sd):
from zoneinfo import ZoneInfo # Import ZoneInfo within the UDF
return datetime.strptime(sd, ‘%Y.%m.%d %H%M%S’).astimezone(ZoneInfo(‘Europe/Brussels’))

x = udf(test_udf, TimestampType())

sample_df = spark.createDataFrame([‘2038.01.09 00:00:00’], StringType()).toDF(‘sampledate’)

cast_df = sample_df.withColumn(‘sampledate’, x(sample_df[‘sampledate’]))
cast_df.show()

Leave a Comment