How to group by timestamp and other columns in Polars

I have a dataframe with many columns, only interested in timestamp, country, province, and population.

I would like to determine the mean population over time by province. I can calculate this using the following in Polars: df.groupby(['timestamp','country', 'prov']).agg(pl.col('population').mean())

However I would like to incorporate rolling timewindows. I have found the following, but I am unsure if I can incorporate both into one expression. df.groupby_rolling('timestamp', period = '1y')

I am looking to find the mean by year, perhaps rolling window is not the correct terminology

Any suggestions?

  • Any reproducible example? Also, see the by argument for groupby_rolling, e.g. df.groupby_rolling('timestamp', by=['country', 'prov'], period='1y')

    – 




  • @Wayoshi it seems to be working with your suggestion. I was hoping to get a mean per 1 year timestamp, however I don’t believe rolling is the way to go. I think I will need to chunk it up before aggregating – thanks for your help!

    – 

  • Then maybe group_by_dynamic is what you are looking for? df.group_by_dynamic("timestamp", every="1y", by=["country", "prov"])

    – 

Leave a Comment