i have pandas DataFrame:
from itertools import product
import pandas as pd
from random import randint
idx = list(product([1,2,3],repeat=2))
index = pd.MultiIndex.from_tuples(idx, names=["x","y"])
df = pd.DataFrame(index=index, columns=["data1"], data=[randint(1,100) for _ in range(9)])
x | y | data1 |
---|---|---|
1 | 1 | 34 |
2 | 45 | |
3 | 23 | |
2 | 1 | 7 |
2 | 8 | |
3 | 41 | |
3 | 1 | 58 |
2 | 43 | |
3 | 9 |
How can generate a new column based on index values and data1
values?
something like:
def calculate(v):
return v.index["x"] +v.index["y"]) * v.data1
df.apply(calculate,axis=1)
You can eval
the operation to create a new_col
:
out = df.eval("new_col=(x+y)*data1")
Output :
print(out)
data1 new_col
x y
1 1 34 68
2 45 135
3 23 92
2 1 7 21
2 8 32
3 41 205
3 1 58 232
2 43 215
3 9 54
Example
I think it is not good to use a random sample for a short example when asking a question.
So I created a new code for your example.
import pandas as pd
idx = pd.MultiIndex.from_product([[1, 2, 3], [1, 2, 3]], names=["x","y"])
data = [34, 45, 23, 7, 8, 41, 58, 43, 9]
df = pd.DataFrame(data, index=idx, columns=['data1'])
Code
First, convert the index to a DataFrame and then sum it by axis=1.
Then, multiply it by df[‘data1’].
df['data1'].mul(df.index.to_frame().sum(axis=1))
output:
x y
1 1 68
2 135
3 92
2 1 21
2 32
3 205
3 1 232
2 215
3 54
dtype: int64
You can achieve this by using the .apply() method with a custom function, as you mentioned. However, there is a small mistake in your provided code. Here’s the corrected code:
import pandas as pd
from itertools import product
from random import randint
idx = list(product([1, 2, 3], repeat=2))
index = pd.MultiIndex.from_tuples(idx, names=["x", "y"])
df = pd.DataFrame(index=index, columns=["data1"], data=[randint(1, 100) for _ in range(9)])
def calculate(v):
return (v[0][0] + v[0][1]) * v[1]
df["result"] = df.apply(calculate, axis=1)
print(df)
This code will generate a new column “result” based on the index values and the “data1” column values. The calculate function takes a row as input (which is a pandas Series), extracts the index values, and computes the result using the formula you provided. The result is then assigned to the “result” column in the DataFrame.
@Barmar sorry, fixed