How to expand multiple list column in rows in python

I have a DataFrame like this:

ID MS DS
654 1500,10000,20000,30000 60,365,730
131 1500,10000,20000 60,365,730
598 1500,10000,20000,30000 60,365,730

The desired output looks like this:

ID MS DS
654 1500 60
654 10000 365
654 20000 730
654 30000 Nan
131 1500 60
131 10000 365
131 20000 730
598 1500 60

I have tried the following code:

from itertools import zip_longest

df.apply(lambda row: list(zip_longest(row['MS'], 
                                  row['DS'], 
                                  fillvalue="Nan")),axis = 1)

This did not work as expected, as it creates a Series with list values like this: [('1500,10000,20000,30000', '60,365,730')], [('1500,10000,20000', '60,365,730')].

  • The values are in list format in both the columns like this [1500,10000,20000,30000], i had to remove the list as stackoverflow was not letting me post the question saying Your post appears to contain code that is not properly formatted as code. Please indent all code by 4 spaces using the code toolbar button or the CTRL+K keyboard shortcut. For more editing help, click the [?] toolbar icon.

    – 

Here’s one approach:

cols = ['MS', 'DS']

dfs = [df[c].explode().str.split(',', expand=True)
       .astype('Int64')
       .stack()
       .rename(c) 
       for c in cols]

out = (pd.concat(dfs, axis=1)
       .droplevel(1)
       .join(df['ID'])
       .loc[:, df.columns]
       )

out

    ID     MS    DS
0  654   1500    60
0  654  10000   365
0  654  20000   730
0  654  30000  <NA>
1  131   1500    60
1  131  10000   365
1  131  20000   730
2  598   1500    60
2  598  10000   365
2  598  20000   730
2  598  30000  <NA>

Explanation

  1. Inside a list comprehension, for each column of ['MS', 'DS']:
    • use Series.explode to get list elements into multiple rows;
    • use Series.str.split to split on ',' and set expand to True to get the result in multiple columns;
    • chain Series.astype with Int64 to turn what will be string values into proper integers, while allowing for NaN values (the shorter sequences will have these for columns not “reached”);
    • apply df.stack to get a pd.Series result (with column values now as a second level index);
    • add Series.rename to restore the original column name.
  2. Join both Series, add non-exploded column, and re-order:
    • use pd.concat along axis=1 (columns)
    • apply df.droplevel to drop the added index level;
    • use df.join to re-add the non-exploded column ID;
    • pass df.columns to df.loc to reset the column order.

Data used

import pandas as pd

data = {
    'ID': [654, 131, 598],
    'MS': [['1500,10000,20000,30000'], ['1500,10000,20000'], 
           ['1500,10000,20000,30000']],
    'DS': [['60,365,730'], ['60,365,730'], 
           ['60,365,730']]
}

df = pd.DataFrame(data)
df

    ID                        MS            DS
0  654  [1500,10000,20000,30000]  [60,365,730]
1  131        [1500,10000,20000]  [60,365,730]
2  598  [1500,10000,20000,30000]  [60,365,730]

You can use str.split and explode:

df[["MS","DS"]] = df[["MS","DS"]].apply(lambda x: x.str.split(","),axis=1)
df["final"] = df[["MS","DS"]].apply(lambda x: list(zip_longest(x["MS"], x["DS"])),axis=1)
df = df.explode("final")
df[["MS","DS"]] = df["final"].values.tolist()

Out:

    ID     MS    DS
0  654   1500    60
0  654  10000   365
0  654  20000   730
0  654  30000  None
1  131   1500    60
1  131  10000   365
1  131  20000   730
2  598   1500    60
2  598  10000   365
2  598  20000   730
2  598  30000  None

Leave a Comment