I have a horizontally large CSV file with 606 columns. I have imported the csv into Pandas dataframe as
df = pd.read_csv("groups.csv")
but I need to change this weird horizontal length! to a more readable vertically table! Can you please let me know how I can Extract every 6 columns in a loop from df
, skip the header and append to df_target
df_target = pd.DataFrame(columns=['GroupA', 'GroupB', 'GroupC','GroupD', 'GroupE', 'GroupF'])
As I said it is important to extract every 6 columns without headers
Use numpy.reshape
with 6
and -1
for arbirtary number of new rows and 6 columns. Solution working if modulo number of columns is 0
.
df = pd.read_csv("groups.csv")
print (len(df.columns))
606
print (606 % 6)
0
df_target = pd.DataFrame(df.to_numpy().reshape(6, -1),
columns=['GroupA', 'GroupB', 'GroupC','GroupD', 'GroupE', 'GroupF'])
Sample:
np.random.seed(123)
df = pd.DataFrame(np.random.randint(10, size=(3,12)))
print (df)
0 1 2 3 4 5 6 7 8 9 10 11
0 2 2 6 1 3 9 6 1 0 1 9 0
1 0 9 3 4 0 0 4 1 7 3 2 4
2 7 2 4 8 0 7 9 3 4 6 1 5
df_target = pd.DataFrame(df.to_numpy().reshape(6, -1),
columns=['GroupA', 'GroupB', 'GroupC','GroupD', 'GroupE', 'GroupF'])
print (df_target)
GroupA GroupB GroupC GroupD GroupE GroupF
0 2 2 6 1 3 9
1 6 1 0 1 9 0
2 0 9 3 4 0 0
3 4 1 7 3 2 4
4 7 2 4 8 0 7
5 9 3 4 6 1 5
EDIT: If there is modulo not 0
is possible use pandas solution with MultiIndex
created by modulo %
and integer division by //
with 6
and reshape by DataFrame.stack
:
print (len(df.columns))
5252
print (5252 % 6)
2
#sample
np.random.seed(123)
df = pd.DataFrame(np.random.randint(10, size=(3,10)))
print (df)
0 1 2 3 4 5 6 7 8 9
0 2 2 6 1 3 9 6 1 0 1
1 9 0 0 9 3 4 0 0 4 1
2 7 3 2 4 7 2 4 8 0 7
a = np.arange(len(df.columns))
cols = ['GroupA', 'GroupB', 'GroupC','GroupD', 'GroupE', 'GroupF']
df_target = (df.set_axis([a % 6, a // 6], axis=1)
.stack().set_axis(cols, axis=1).reset_index(drop=True))
print (df_target)
GroupA GroupB GroupC GroupD GroupE GroupF
0 2 2 6 1 3.0 9.0
1 6 1 0 1 NaN NaN
2 9 0 0 9 3.0 4.0
3 0 0 4 1 NaN NaN
4 7 3 2 4 7.0 2.0
5 4 8 0 7 NaN NaN