Should I use negative binomial GAM?

I’m trying to model a data of presence and absence of birds in nest boxes. I look whether it’s zero-inflated and it gave me this:

Observed zeros: 36
  Predicted zeros: 40
            Ratio: 1.11

Is it really worth to make a negative binomial GAM for it? And if yes, what theta should I used or how can I find it out? Thank you in advance!

I made already a binomial GLM for the data and I’m not sure whether it is a good idea

Coefficients:
    (Intercept)      In_Temp_avg  Distance_feeder  
      -1.492394         0.134624         0.005013  

Degrees of Freedom: 57 Total (i.e. Null);  55 Residual
Null Deviance:      76.99 
Residual Deviance: 75.35    AIC: 81.35

  • 2

    $\begingroup$
    To clarify, the response variable (nest box occupancy) is binary (0/1, present/absent), right?
    $\endgroup$

    – 

If the response variable is binary (0/1, presence/absence) there’s really not much you can do other than some form of logistic regression (in R, glm(..., family="binomial"); a negative binomial model, although it sounds like it should be good for binomial data as well, is for count data (0, 1, 2 …), not for binary data (0/1).

Binary data can be overdispersed, but it’s easier to detect if the data can be grouped in some way (e.g., all of your predictors are categorical); it looks like your predictors are continuous, which means it will be harder to investigate (and you can probably get away with not checking 🙂 ). If you’re worried about it you could use a quasibinomial model (family = "quasibinomial" in R)

Similarly, it doesn’t make sense to worry about zero-inflation for binary data: technically speaking, zero-inflation is unidentifiable (i.e., there is no way to tell the difference between structural zeros, nest boxes that would never have birds in them, and sampling zeros, nest boxes that happen not to have birds in them because they have a low probability of occupancy).

A GAM is a generalized additive model, which can account for nonlinear patterns in continuous variables. You can fit these easily with mgcv::gam(). It might be worth considering this option.

Checking the adequacy of a model for binary data is relatively hard; one way to check the adequacy of your model would be with the DHARMa package.

Leave a Comment