This is my dataframe:
import pandas as pd
df = pd.DataFrame(
{
'a': [110, 115, 112, 180, 150, 175, 160, 145, 200, 205, 208, 203, 206, 207, 208, 209, 210, 215],
'b': [1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1],
}
)
And this is the output that I want. I want to create column c
.
a b c
0 110 1 110
1 115 1 115
2 112 0 115
3 180 1 180
4 150 0 180
5 175 1 180
6 160 0 180
7 145 0 180
8 200 1 200
9 205 1 205
10 208 1 208
11 203 0 208
12 206 1 208
13 207 1 208
14 208 1 208
15 209 1 209
16 210 1 210
17 215 1 215
When df.a > df.a.shift(1)
b
is 1 otherwise it is 0.
Steps needed:
a) Find where the streak of 1 in b
ends.
b) Keep the maximum value of the streak.
c) Put that value in c
until a greater value is found in a
.
For example when 180 is found in b
:
a) Row 3
has streak of 1.
b) Maximum value of the streak is 180.
c) df.c = 180
until a greater value is found in a
. In this case it is 200 at row 8
.
It was not easy to elaborate the problem. Maybe I have described the problem with wrong words. So If there are any questions feel free to ask in the comments.
And I really appreciate if you introduce a built-in way or a clean way to create column b
. I put those 1 and 0s manually.
This is what I have tried. But it does not feel like a correct approach.
df['streak'] = df['b'].ne(df['b'].shift()).cumsum()
df['max'] = df.groupby('streak')['a'].max()
You just want cummax
:
df['c'] = df['a'].cummax()
Output:
a b c
0 110 1 110
1 115 1 115
2 112 0 115
3 180 1 180
4 150 0 180
5 175 1 180
6 160 0 180
7 145 0 180
8 200 1 200
9 205 1 205
10 208 1 208
11 203 0 208
12 206 1 208
13 207 1 208
14 208 1 208
15 209 1 209
16 210 1 210
17 215 1 215