I have a dataframe with many columns, only interested in timestamp, country, province, and population.
I would like to determine the mean population over time by province. I can calculate this using the following in Polars: df.groupby(['timestamp','country', 'prov']).agg(pl.col('population').mean())
However I would like to incorporate rolling timewindows. I have found the following, but I am unsure if I can incorporate both into one expression. df.groupby_rolling('timestamp', period = '1y')
I am looking to find the mean by year, perhaps rolling window is not the correct terminology
Any suggestions?
Any reproducible example? Also, see the
by
argument forgroupby_rolling
, e.g.df.groupby_rolling('timestamp', by=['country', 'prov'], period='1y')
@Wayoshi it seems to be working with your suggestion. I was hoping to get a mean per 1 year timestamp, however I don’t believe rolling is the way to go. I think I will need to chunk it up before aggregating – thanks for your help!
Then maybe
group_by_dynamic
is what you are looking for?df.group_by_dynamic("timestamp", every="1y", by=["country", "prov"])