Need perform aggregation operartion in pysaprk streaming job

I’m receiving data of multiple sensors in stream in every 1 minute interval into databricks. Need to create a new sensor name ‘PQRS’ if sensors ‘ABC’ and ‘DEF’ is available for each pyspark streaming load and the value of the sensor should average of value of ABC and DEF.

Input data(1 min stream)

sensor_name value timestamp
ABC 10 2023-11-02T11:49:32.028Z
DEF 20 2023-11-02T11:49:32.028Z
GHI 12 2023-11-02T11:49:32.028Z

Output data

sensor_name value timestamp
ABC 10 2023-11-02T11:49:32.028Z
DEF 20 2023-11-02T11:49:32.028Z
PQRS 15 2023-11-02T11:49:32.028Z
GHI 12 2023-11-02T11:49:32.028Z

Leave a Comment