Rolling Mean Window Size [duplicate]

Question 1

I’d like to calculate a rolling mean for a window size of 2, while excluding NaNs as counting as part of the window size. I’d like a rolling mean of the “last 2 non null values”.

I don’t want indexes with a NaN value to count towards the 2 window size. If an index is blank, then the mean should look at the last non-null

The following code works by removing df rows with nulls in df[‘value’]. However, I lose the row with the nan, and want to keep the original data frame intact.

df=df.loc[df['value'].notna()]
df['rolling_mean']=df['value'].transform(lambda x: x.rolling(2, 0).mean())

Expected results

Output for window size 3. Index 5 is averaging indexes 2-5:

   value  rolling_mean3
0    1.0           1.0
1    2.0           2.0
2    6.0           3
3    1.0           3
4    NaN           3
5    5          4

Question 2

You could drop nan's do rolling and then join back:

df1 = df.join(df.dropna().rolling(2).mean(), rsuffix='_rolling_mean')
df1['value_rolling_mean'] = df1['value_rolling_mean'].ffill()
df1
   value  value_rolling_mean
0    1.0                 NaN
1    2.0                 1.5
2    NaN                 1.5
3    3.0                 2.5
4    4.0                 3.5

You can easily change the first NaN to the first value in value

Question 3

Random Init:

choices = [1,2,3,4,np.nan, np.nan,]
np.random.seed(5)
df = pd.DataFrame({'value': np.random.choice(choices, size = 10)})
df

df:

  value
0    4.0
1    NaN
2    1.0
3    2.0
4    1.0
5    NaN
6    4.0
7    1.0
8    1.0
9    NaN

Using your code to calculate for rows where value is not null.
Calculate mean for NA rows separately for window_size = 4 :

window_size = 4

# calculate for not NA rows
df2 = df.copy()
df2=df2.loc[df2['value'].notna()]
df2['rolling_mean']=df2['value'].transform(lambda x: x.rolling(2, 0).mean())
df = df.merge(df2['rolling_mean'], left_index=True, right_index=True, how='left')

# calculate for NA rows
arr = df['value'].to_numpy()
for i in df[df['value'].isna()].index:
  start = min(0, i-5)
  df.iloc[i]['rolling_mean'] = np.mean(arr[:i][(~np.isnan(arr[:i]))][-window_size:])
df

Output:

   value  rolling_mean
0    4.0          4.00
1    NaN          4.00
2    1.0          2.50
3    2.0          1.50
4    1.0          1.50
5    NaN          2.00
6    4.0          2.50
7    1.0          2.50
8    1.0          1.00
9    NaN          1.75

Leave a Comment Cancel reply