Say I have a big dataset called fulldata
with 3 columns called a, b and c as such:
df1 <- data.frame(
a = rnorm(4, 10), b = rnorm(4, 6), c = 1
)
df2 <- data.frame(
a = rnorm(6, 10), b = rnorm(6, 9), c = 2
)
df3 <- data.frame(
a = rnorm(8, 8), b = rnorm(8, 9), c = 3
)
fulldata <- rbind(df1, df2)
fulldata <- rbind(fulldata, df3)
And I also have subsets based on the value of c such that df1
are rows where c = 1 … and so on 3. I have vectors referencing these subsets and column names as such.
c_values <- c("df1", "df2", "df3")
columns <- c("a", "b", "c")
Essentially I want to create 5 number summary tables for each column a to c and each subset, like you would get with summary(x) + a mean with columns for min, q1, median, etc all the way to mean. Another column indicating the c value for the subset and also another column indicating which column (a, b or c) is being summarised. Finally one where there is no subset based on c value but the fulldata.
Edit: I’m sorry I tried to format this table into the question but it didn’t appear after I posted it
for (column in columns) {
summary1 <- c(summary(df1$columns), mean(df3$columns))
summary2 <- c(summary(df2$columns), mean(df3$columns))
summary3 <- c(summary(df3$columns), mean(df3$columns))
} #and then bind the summaries together somehow
for (c_value in values) {
for(column in columns)
} #bind the whole table together
In practice, there are far more than 3 subsets and 3 column names and so I want to be able to cycle through them with a quick loop, hence the vector names from before but I cant seem to get the syntax to work.