With gt_summary is there a way to limit each variable sets for paired comparison?

Question

Given a dataset with missing values, I want to produce means, standard deviations, etc. for a paired comparison. Thus any rows of data that are not matched by a given ID should not be included in the statistics. Here is a minimal example of what I am trying to do:

library(gtsummary)

D <- tibble(Set = c(rep('A',4), rep('B', 4)),
            ID = c(1, 2, 3, 4, 1, 2, 3, 4),
            V1 = c(NA, 1, 2, 3, 1, 3, 2, 4),
            V2 = c(4, NA, NA, 5, 6, 7, 8, 9))

D |> 
  tbl_summary(
    by = Set, 
    type = list(everything() ~ 'continuous'),
    statistic = all_continuous() ~ "{mean} <{N_nonmiss}>",
    include = -ID,
    missing = 'no') |>
  add_n() |>
  add_p(test = everything() ~ 'paired.t.test',
        group = ID)

which yields the following:

This yields the correct p values but for V1 I would like the means of A to be mean(1,2,3) [it is] and B to be mean(3,2,4) [instead it is mean(1,3,2,4)] as only IDs 2,3,4 should be used since ID=1 does not have a real paired set. N should be 3 for both A and B. Likewise for V2 I would like the means of A to be mean(4,5) [it is] and B to be mean(6,9) as only IDs 1 and 4 have paired values [instead it is mean(6,7,8,9)]. N should be 2 for both A and B.

If this cannot be done within the table functions, then there is likely a way to change the real value of the missing pair also into an NA value based on the ID column. This would make the data work correctly within the table function but I haven’t managed to solve that approach either.

Leave a Comment Cancel reply