How to calculate standard error and CI to plot in R

Question

I am first calculating the percentage of respondents across different demographics who graduated from high school, based on their program status. This code gets me those percents:

d_perc <- d %>% 
  group_by(group, levels, program_cat, highschool) %>% 
  summarize(n = n()) %>% 
  mutate(percent = n/sum(n)*100) %>% 
  select(-n)

Next, I want to additionally calculate error term around these perents. What is the best way to then calculate the SEs and corresponding 95% CI? (My ultimately goal is to then use geom_point() and geom_errorbar to plot these together, though I already have code to do this.)

I tried something like:

d_perc$se <- sqrt(d_perc$percent*(1-d_perc$percent)/d_perc$percent)

Which would then be followed by something like + and - 1.96*d_perc$se to get the upper and lower estimate. However, when I try the above, I just get a series of NaNs for the se column.

Data here (sorry for the large data; I used head(100) to get somewhat more realistic percents by group):

d_perc <- structure(list(highschool= structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L), levels = c("no", 
"yes"), class = "factor"), program_cat = structure(c(2L, 2L, 2L, 
2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 
1L), levels = c("0", "1", "2"), class = "factor"), group = c("gender", 
"race", "cohort", "gender", "race", "cohort", "gender", "race", 
"cohort", "gender", "race", "cohort", "gender", "race", "cohort", 
"gender", "race", "cohort", "gender", "race", "cohort", "gender", 
"race", "cohort", "gender", "race", "cohort", "gender", "race", 
"cohort", "gender", "race", "cohort", "gender", "race", "cohort", 
"gender", "race", "cohort", "gender", "race", "cohort", "gender", 
"race", "cohort", "gender", "race", "cohort", "gender", "race", 
"cohort", "gender", "race", "cohort", "gender", "race", "cohort", 
"gender", "race", "cohort", "gender", "race", "cohort", "gender", 
"race", "cohort", "gender", "race", "cohort", "gender", "race", 
"cohort", "gender", "race", "cohort", "gender", "race", "cohort", 
"gender", "race", "cohort", "gender", "race", "cohort", "gender", 
"race", "cohort", "gender", "race", "cohort", "gender", "race", 
"cohort", "gender", "race", "cohort", "gender", "race", "cohort", 
"gender"), levels = structure(c(1L, 3L, 7L, 2L, 5L, 7L, 1L, 3L, 
6L, 2L, 4L, 6L, 1L, 5L, 7L, 1L, 3L, 7L, 1L, 3L, 6L, 1L, 3L, 6L, 
1L, 3L, 7L, 1L, 5L, 6L, 2L, 5L, 7L, 1L, 5L, 6L, 1L, 3L, 6L, 2L, 
3L, 7L, 1L, 3L, 6L, 1L, 4L, 6L, 1L, 5L, 6L, 1L, 5L, 6L, 1L, 4L, 
6L, 2L, 3L, 6L, 2L, 3L, 7L, 1L, 3L, 7L, 1L, 3L, 6L, 1L, 4L, 7L, 
1L, 4L, 7L, 1L, 3L, 7L, 1L, 3L, 7L, 1L, 4L, 7L, 1L, 3L, 7L, 1L, 
3L, 6L, 1L, 3L, 7L, 2L, 3L, 7L, 2L, 5L, 6L, 2L), levels = c("Female", 
"Male", "Black", "Hispanic", "White", "CohortA", "CohortB"), class = "factor")), row.names = c(NA, 
-100L), class = c("tbl_df", "tbl", "data.frame"))

Leave a Comment Cancel reply