How to expand multiple list column in rows in python

Question 1

I have a DataFrame like this:

ID	MS	DS
654	1500,10000,20000,30000	60,365,730
131	1500,10000,20000	60,365,730
598	1500,10000,20000,30000	60,365,730

The desired output looks like this:

ID	MS	DS
654	1500	60
654	10000	365
654	20000	730
654	30000	Nan
131	1500	60
131	10000	365
131	20000	730
598	1500	60

I have tried the following code:

from itertools import zip_longest

df.apply(lambda row: list(zip_longest(row['MS'], 
                                  row['DS'], 
                                  fillvalue="Nan")),axis = 1)

This did not work as expected, as it creates a Series with list values like this: [('1500,10000,20000,30000', '60,365,730')], [('1500,10000,20000', '60,365,730')].

Question 2

Here’s one approach:

cols = ['MS', 'DS']

dfs = [df[c].explode().str.split(',', expand=True)
       .astype('Int64')
       .stack()
       .rename(c) 
       for c in cols]

out = (pd.concat(dfs, axis=1)
       .droplevel(1)
       .join(df['ID'])
       .loc[:, df.columns]
       )

out

    ID     MS    DS
0  654   1500    60
0  654  10000   365
0  654  20000   730
0  654  30000  <NA>
1  131   1500    60
1  131  10000   365
1  131  20000   730
2  598   1500    60
2  598  10000   365
2  598  20000   730
2  598  30000  <NA>

Explanation

Inside a list comprehension, for each column of ['MS', 'DS']:
- use Series.explode to get list elements into multiple rows;
- use Series.str.split to split on ',' and set expand to True to get the result in multiple columns;
- chain Series.astype with Int64 to turn what will be string values into proper integers, while allowing for NaN values (the shorter sequences will have these for columns not “reached”);
- apply df.stack to get a pd.Series result (with column values now as a second level index);
- add Series.rename to restore the original column name.
Join both Series, add non-exploded column, and re-order:
- use pd.concat along axis=1 (columns)
- apply df.droplevel to drop the added index level;
- use df.join to re-add the non-exploded column ID;
- pass df.columns to df.loc to reset the column order.

Data used

import pandas as pd

data = {
    'ID': [654, 131, 598],
    'MS': [['1500,10000,20000,30000'], ['1500,10000,20000'], 
           ['1500,10000,20000,30000']],
    'DS': [['60,365,730'], ['60,365,730'], 
           ['60,365,730']]
}

df = pd.DataFrame(data)
df

    ID                        MS            DS
0  654  [1500,10000,20000,30000]  [60,365,730]
1  131        [1500,10000,20000]  [60,365,730]
2  598  [1500,10000,20000,30000]  [60,365,730]

Question 3

You can use str.split and explode:

df[["MS","DS"]] = df[["MS","DS"]].apply(lambda x: x.str.split(","),axis=1)
df["final"] = df[["MS","DS"]].apply(lambda x: list(zip_longest(x["MS"], x["DS"])),axis=1)
df = df.explode("final")
df[["MS","DS"]] = df["final"].values.tolist()

Out:

    ID     MS    DS
0  654   1500    60
0  654  10000   365
0  654  20000   730
0  654  30000  None
1  131   1500    60
1  131  10000   365
1  131  20000   730
2  598   1500    60
2  598  10000   365
2  598  20000   730
2  598  30000  None

Leave a Comment Cancel reply