I have a source dataframe included below (df) that I would like to reshape into a 2 dimensional dataframe (outputdf) with the first column as all unique values that exist in the df and a column for a select set of unique values in column B which I have loaded into a list (list). I would then populate this outputdf with the corresponding values in column c
df:
Column A | Column B | Column C |
---|---|---|
Cell 1 | Cell A | 1 |
Cell 2 | Cell A | 2 |
Cell 3 | Cell A | 3 |
Cell 1 | Cell B | 4 |
Cell 2 | Cell B | 5 |
Cell 3 | Cell B | 6 |
Cell 1 | Cell C | 7 |
Cell 2 | Cell C | 8 |
Cell 3 | Cell C | 9 |
list = [Cell A, Cell B)
I am unsure how to best do this in a looping manner as I would use this for a large dataframe with many entries. My goal would be something along the lines of the following though I know this logic would not work:
for x in list:
outputdf[x] = outputdf.merge(df, on=['ColumnA', x], how='left').set_index('ColumnA')
to yield the following outputdf
outputdf:
Column A | Cell A | Cell B |
---|---|---|
Cell 1 | 1 | 4 |
Cell 2 | 2 | 5 |
Cell 3 | 3 | 6 |
Any help is appreciated.
I think you can use pivot_table method to turn values in Column B into columns and then use rename_axis and droplevel to clean up the column names
test.csv
Column A,Column B,Column C
Cell 1,Cell A,1
Cell 2,Cell A,2
Cell 3,Cell A,3
Cell 1,Cell B,4
Cell 2,Cell B,5
Cell 3,Cell B,6
Cell 1,Cell C,7
Cell 2,Cell C,8
Cell 3,Cell C,9
import pandas as pd
df = pd.read_csv('test.csv')
table = pd.pivot_table(df, index = 'Column A', columns = ['Column B'])
table = table.rename_axis([None, None], axis=1)
table.columns = table.columns.droplevel()
The code would return like below
Cell A Cell B Cell C
Column A
Cell 1 1 4 7
Cell 2 2 5 8
Cell 3 3 6 9