How to Extract Every 6th Columns of a Horizontally Wide Dataframe and Append as Rows into New Cleaner DataFrame

Question 1

I have a horizontally large CSV file with 606 columns. I have imported the csv into Pandas dataframe as

df = pd.read_csv("groups.csv")

but I need to change this weird horizontal length! to a more readable vertically table! Can you please let me know how I can Extract every 6 columns in a loop from df, skip the header and append to df_target

df_target = pd.DataFrame(columns=['GroupA', 'GroupB', 'GroupC','GroupD', 'GroupE', 'GroupF'])

As I said it is important to extract every 6 columns without headers

Question 2

Use numpy.reshape with 6 and -1 for arbirtary number of new rows and 6 columns. Solution working if modulo number of columns is 0.

df = pd.read_csv("groups.csv")

print (len(df.columns))
606
print (606 % 6)
0


df_target = pd.DataFrame(df.to_numpy().reshape(6, -1),
                       columns=['GroupA', 'GroupB', 'GroupC','GroupD', 'GroupE', 'GroupF'])

Sample:

np.random.seed(123)
df = pd.DataFrame(np.random.randint(10, size=(3,12)))
print (df)
   0   1   2   3   4   5   6   7   8   9   10  11
0   2   2   6   1   3   9   6   1   0   1   9   0
1   0   9   3   4   0   0   4   1   7   3   2   4
2   7   2   4   8   0   7   9   3   4   6   1   5

df_target = pd.DataFrame(df.to_numpy().reshape(6, -1),
                       columns=['GroupA', 'GroupB', 'GroupC','GroupD', 'GroupE', 'GroupF'])
print (df_target)
   GroupA  GroupB  GroupC  GroupD  GroupE  GroupF
0       2       2       6       1       3       9
1       6       1       0       1       9       0
2       0       9       3       4       0       0
3       4       1       7       3       2       4
4       7       2       4       8       0       7
5       9       3       4       6       1       5

EDIT: If there is modulo not 0 is possible use pandas solution with MultiIndex created by modulo % and integer division by // with 6 and reshape by DataFrame.stack:

print (len(df.columns))
5252
print (5252 % 6)
2

#sample
np.random.seed(123)
df = pd.DataFrame(np.random.randint(10, size=(3,10)))
print (df)
   0  1  2  3  4  5  6  7  8  9
0  2  2  6  1  3  9  6  1  0  1
1  9  0  0  9  3  4  0  0  4  1
2  7  3  2  4  7  2  4  8  0  7

a = np.arange(len(df.columns))

cols = ['GroupA', 'GroupB', 'GroupC','GroupD', 'GroupE', 'GroupF']
df_target = (df.set_axis([a % 6, a // 6], axis=1)
                .stack().set_axis(cols, axis=1).reset_index(drop=True))
print (df_target)
   GroupA  GroupB  GroupC  GroupD  GroupE  GroupF
0       2       2       6       1     3.0     9.0
1       6       1       0       1     NaN     NaN
2       9       0       0       9     3.0     4.0
3       0       0       4       1     NaN     NaN
4       7       3       2       4     7.0     2.0
5       4       8       0       7     NaN     NaN

Leave a Comment Cancel reply