I’d like to calculate a rolling mean for a window size of 2, while excluding NaNs as counting as part of the window size. I’d like a rolling mean of the “last 2 non null values”.
I don’t want indexes with a NaN value to count towards the 2 window size. If an index is blank, then the mean should look at the last non-null
The following code works by removing df rows with nulls in df[‘value’]. However, I lose the row with the nan, and want to keep the original data frame intact.
df=df.loc[df['value'].notna()]
df['rolling_mean']=df['value'].transform(lambda x: x.rolling(2, 0).mean())
Output for window size 3. Index 5 is averaging indexes 2-5:
value rolling_mean3
0 1.0 1.0
1 2.0 2.0
2 6.0 3
3 1.0 3
4 NaN 3
5 5 4
You could drop nan's
do rolling and then join back:
df1 = df.join(df.dropna().rolling(2).mean(), rsuffix='_rolling_mean')
df1['value_rolling_mean'] = df1['value_rolling_mean'].ffill()
df1
value value_rolling_mean
0 1.0 NaN
1 2.0 1.5
2 NaN 1.5
3 3.0 2.5
4 4.0 3.5
You can easily change the first NaN
to the first value in value
Random Init:
choices = [1,2,3,4,np.nan, np.nan,]
np.random.seed(5)
df = pd.DataFrame({'value': np.random.choice(choices, size = 10)})
df
df:
value
0 4.0
1 NaN
2 1.0
3 2.0
4 1.0
5 NaN
6 4.0
7 1.0
8 1.0
9 NaN
Using your code to calculate for rows where value is not null.
Calculate mean for NA rows separately for window_size = 4 :
window_size = 4
# calculate for not NA rows
df2 = df.copy()
df2=df2.loc[df2['value'].notna()]
df2['rolling_mean']=df2['value'].transform(lambda x: x.rolling(2, 0).mean())
df = df.merge(df2['rolling_mean'], left_index=True, right_index=True, how='left')
# calculate for NA rows
arr = df['value'].to_numpy()
for i in df[df['value'].isna()].index:
start = min(0, i-5)
df.iloc[i]['rolling_mean'] = np.mean(arr[:i][(~np.isnan(arr[:i]))][-window_size:])
df
Output:
value rolling_mean
0 4.0 4.00
1 NaN 4.00
2 1.0 2.50
3 2.0 1.50
4 1.0 1.50
5 NaN 2.00
6 4.0 2.50
7 1.0 2.50
8 1.0 1.00
9 NaN 1.75
Post your data as text instead of image