Continuosly adding the product of 2 columns to the cumulative sum of a column

Question 1

I have stock data and I am trying to adjust share count of a hypothetical portfolio using a dividend reinvestment plan. Every time there is a dividend distribution the current share count needs to be multiplied by the dividend amount, that product then needs to be divided by the current price of the stock to return the reinvested shares.

See the sample dataframe below:

    vals = [38.46429771,
       46.01712878,
       38.89251387,
       33.89977144,
       27.66171419,
       38.03103652,
       31.26521149,
       35.82193894,
       49.47433244,
       33.60148345,]

divi = [0, 0, 1, 0, 0, 1, 0, 0, 1, 0]

shares = [30,0,0,0,0,0,0,0,0,0]

sample = pd.DataFrame(data=zip(vals, divi,shares),\
             index = range(0,10), columns=['Open', 'Dividends', 'share_count'])


    Open    Dividends   share_count
0   38.464298   0   30
1   46.017129   0   0
2   38.892514   1   0
3   33.899771   0   0
4   27.661714   0   0
5   38.031037   1   0
6   31.265211   0   0
7   35.821939   0   0
8   49.474332   1   0
9   33.601483   0   0

using the sample above we have a starting share count of 30 which would carry to iloc[1], iloc[2] would have a new value of 30.7713 (((30*1)/38.892)+(share_count) at the time), which would carry through to iloc[3] and iloc[4]. iloc[5] would calculate a new value and carry forward until iloc[8] where a new value would be calculated again.

I’m using this loop at the moment to calculate the share_count:

cum_val = []
x = max(sample.share_count)
t = []

for d, sc, op in zip(sample.Dividends, sample.share_count, sample.Open):
    divi = (x*d)/op
    t = divi+x
    cum_val.append(t)
    if t > x:
        x = t
    else:
        pass

but I assume there is a more pythonic method to solve this problem. any suggestions would be greatly appreciated.

Question 2

When I look at your example, I see that you need to compute new value only if the Dividends != 0, so you can leverage that:

def fn(share_count):
    def _inner(x):
        nonlocal share_count
        rv = share_count * x["Dividends"] / x["Open"] + share_count
        share_count = rv
        return rv

    return _inner


starting_value = sample.loc[sample["share_count"].idxmax(), "share_count"]
mask = sample["Dividends"] > 0
sample.loc[mask, "res"] = sample[mask].apply(fn(starting_value), axis=1)

sample = sample.ffill().fillna(starting_value)
print(sample)

Prints:

        Open  Dividends  share_count        res
0  38.464298          0           30  30.000000
1  46.017129          0            0  30.000000
2  38.892514          1            0  30.771357
3  33.899771          0            0  30.771357
4  27.661714          0            0  30.771357
5  38.031037          1            0  31.580468
6  31.265211          0            0  31.580468
7  35.821939          0            0  31.580468
8  49.474332          1            0  32.218789
9  33.601483          0            0  32.218789

Question 3

Assuming only the first share_count is initially non-zero, this can be changed into a vectorized cumprod:

df['out'] = (df['Open']
 .add(1).div(df['Open'])
 .mask(df['Dividends'].eq(0), 1)
 .cumprod()
 .mul(df['share_count'].iloc[0])
 )

Output:

        Open  Dividends  share_count        out
0  38.464298          0           30  30.000000
1  46.017129          0            0  30.000000
2  38.892514          1            0  30.771357
3  33.899771          0            0  30.771357
4  27.661714          0            0  30.771357
5  38.031037          1            0  31.580468
6  31.265211          0            0  31.580468
7  35.821939          0            0  31.580468
8  49.474332          1            0  32.218789
9  33.601483          0            0  32.218789

Leave a Comment Cancel reply