I’m looking for some advice on analysing this data. I think I need a multilevel model.
I have 58 participants measured progesterone and a factor at 4 timepoints. I would like to do a model to see how progesterone predicts factor1, but it should take into account repeated measurements per individual (4* per person).
factor1~progesterone
I also know that between time 1-3, progesterone will increase, and at time 4 progesterone will drop, for all.
I was thinking of a model like
mod<-lmer(Factor1 ~ timepoint + progesterone + (1 | ID ) ,data = df) which takes into account ID, but would really love some advice.
Here is similar data to what I am using:
set.seed(123)
# Number of participants and timepoints
num_part <- 58
num_timepoints <- 4
# Generate progesterone data
progesterone <- rnorm(num_part * num_timepoints, mean = 0, sd = 1)
progesterone[is.na(progesterone)] <- -3.8 # Replace NA with the specified value
# Ensure that the generated progesterone values fall within the desired range
progesterone <- pmin(pmax(progesterone, -3.8), 1.4)
# Generate Factor1 data
factor1 <- rnorm(num_part * num_timepoints, mean = 0.03, sd = 1)
factor1 <- pmin(pmax(factor1, -1.840773), 2.149765) # Ensure values are within the desired range
# Create a data frame
df <- data.frame(
ID = rep(1:num_part, each = num_timepoints),
timepoint = rep(1:num_timepoints, times = num_part),
progesterone = progesterone,
Factor1 = factor1
)
This question would probably be a better fit on stats.stackexchange – it’s more about statistical methodology than about programming. I think any answers will need a little more context – do you think a linear relationship between progestorone and your factor is plausible? If not, maybe us a GAM instead. You say you expect progesterone to increase through t3 and then decrease in t4 – are there other consistent differences across time? It’s hard to say whether time should be included at all, and if so if it should be numeric or categorical.
You’re quite right, sorry. Have put it here: stats.stackexchange.com/questions/635059/… a linear relationship between them is plausible, and there are no other consistent differences expected other than progesterone increases through t3 then drops. Thanks 🙂
Your model
mod <- lmer(Factor1 ~ timepoint + progesterone + (1 | ID), data = df)
represents fixed effects for timepoint and progesterone, and a random intercept for each participant. Basically, here random intercept accounts for the repeated measurements within individuals. Given additional information by you that the progesterone increases from time 1 to 3 and then drops at time 4 for all participants, did you try to incorporate this into the model? if not, you could try something like::mod <- lmer(Factor1 ~ timepoint + progesterone + progesterone:timepoint + (1 | ID), data = df)
Above suggestion, for example, would represent the fixed effects for timepoint, progesterone, and their interaction, as well as a random intercept for each participant in your data. My suggestion, you should also check the assumptions of your model, for example, the normality of residuals and homoscedasticity etc., and then you should consider choosing a model based on the fit of that model on your data.