Say, I have a dataset in the following DataFrame:
df=pl.DataFrame({
'x':['a','a','b','b'],
'y':['b','c','c','a'],
'value':[3,5,1,4]
})
df
shape: (4, 3)
┌─────┬─────┬───────┐
│ x ┆ y ┆ value │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 │
╞═════╪═════╪═══════╡
│ a ┆ b ┆ 3 │
│ a ┆ c ┆ 5 │
│ b ┆ c ┆ 1 │
│ b ┆ a ┆ 4 │
└─────┴─────┴───────┘
Now, I’d like to add a column to this dataframe that would contain the inverse
value. I define an inverse value as inverse(x, y) == value(y, x)
. E.g., from
the example above, inverse (a, b) == value(b, a) == 4
.
If value(y, x)
didn’t exist then inverse(x, y)
would be given the default
value of 0
.
In other words, I’d like to add an inverse
column such as I’d end up with
something like this:
shape: (4, 4)
┌─────┬─────┬───────┬─────────┐
│ x ┆ y ┆ value ┆ inverse │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 ┆ i64 │
╞═════╪═════╪═══════╪═════════╡
│ a ┆ b ┆ 3 ┆ 4 │
│ a ┆ c ┆ 5 ┆ 0 │
│ b ┆ c ┆ 1 ┆ 0 │
│ b ┆ a ┆ 4 ┆ 3 │
└─────┴─────┴───────┴─────────┘
Is this doable in an easy and optimal way? Preferably with expressions?
Thanks a lot in advance.
You could join
it to itself with aliases and then do fill_null(0)
.
df.join(
df.select(
y="x",
x="y",
inverse="value"
),
on=["x","y"], how="left"
).fill_null(0)
shape: (4, 4)
┌─────┬─────┬───────┬─────────┐
│ x ┆ y ┆ value ┆ inverse │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 ┆ i64 │
╞═════╪═════╪═══════╪═════════╡
│ a ┆ b ┆ 3 ┆ 4 │
│ a ┆ c ┆ 5 ┆ 0 │
│ b ┆ c ┆ 1 ┆ 0 │
│ b ┆ a ┆ 4 ┆ 3 │
└─────┴─────┴───────┴─────────┘