How to genrate column values using row index values in pandas?

Question 1

i have pandas DataFrame:

from itertools import product
import pandas as pd
from random import randint

idx = list(product([1,2,3],repeat=2))
index = pd.MultiIndex.from_tuples(idx, names=["x","y"])
df = pd.DataFrame(index=index, columns=["data1"], data=[randint(1,100) for _ in range(9)])

x	y	data1
1	1	34
	2	45
	3	23
2	1	7
	2	8
	3	41
3	1	58
	2	43
	3	9

How can generate a new column based on index values and data1 values?

something like:

def calculate(v):
  return v.index["x"] +v.index["y"]) * v.data1

df.apply(calculate,axis=1)

Question 2

You can eval the operation to create a new_col :

out = df.eval("new_col=(x+y)*data1")

Output :

print(out)

     data1  new_col
x y                
1 1     34       68
  2     45      135
  3     23       92
2 1      7       21
  2      8       32
  3     41      205
3 1     58      232
  2     43      215
  3      9       54

Question 3

Example

I think it is not good to use a random sample for a short example when asking a question.

So I created a new code for your example.

import pandas as pd
idx = pd.MultiIndex.from_product([[1, 2, 3], [1, 2, 3]], names=["x","y"])
data = [34, 45, 23, 7, 8, 41, 58, 43, 9]
df = pd.DataFrame(data, index=idx, columns=['data1'])

Code

First, convert the index to a DataFrame and then sum it by axis=1.

Then, multiply it by df[‘data1’].

df['data1'].mul(df.index.to_frame().sum(axis=1))

output:

x  y
1  1     68
   2    135
   3     92
2  1     21
   2     32
   3    205
3  1    232
   2    215
   3     54
dtype: int64

Question 4

You can achieve this by using the .apply() method with a custom function, as you mentioned. However, there is a small mistake in your provided code. Here’s the corrected code:

import pandas as pd
from itertools import product
from random import randint

idx = list(product([1, 2, 3], repeat=2))
index = pd.MultiIndex.from_tuples(idx, names=["x", "y"])
df = pd.DataFrame(index=index, columns=["data1"], data=[randint(1, 100) for _ in range(9)])


def calculate(v):
    return (v[0][0] + v[0][1]) * v[1]

df["result"] = df.apply(calculate, axis=1)
print(df)

This code will generate a new column “result” based on the index values and the “data1” column values. The calculate function takes a row as input (which is a pandas Series), extracts the index values, and computes the result using the formula you provided. The result is then assigned to the “result” column in the DataFrame.

Leave a Comment Cancel reply