How to fit “Negative Binomial” Distribution on a histogram using ggplot2()?

I am working with a dataset that I believe follows a “Negative Binomial” distribution. However, when I fit the Negative Binomial distribution, it turns out to be a poor fit. To explore further, I simulated a Negative Binomial distribution, but even on the simulated data, the overlaying distribution does not provide a good fit.

Here is my simulated data:

library(ggplot2)
library(MASS)  
library(fitdistrplus)
# Generating negative binomial random numbers
n <- 1000  # Number of random numbers
size <- 5  # Number of successes
prob <- 0.3  # Probability of success

# Generating negative binomial random numbers
negative_binomial <- rnbinom(n, size, prob)
xx <- data.frame(negative_binomial)

I want to create a histogram with an overlay of the ‘Negative Binomial‘ distribution on this data. Let’s assume that I was given this data, so I had to estimate the parameters of the distribution using fitdist().

fit <- fitdistr(negative_binomial,densfun = "negative binomial")
ggplot(data = xx, aes(negative_binomial)) +
  geom_histogram(
    aes(y = ..density..),
    bins = 18, color = "black", fill = "lightblue") +
  stat_function(fun = dnbinom ,
    args = list(mu = fit$estimate[2] , size = fit$estimate[1]),
    color = "red", size = 1)

Question: Despite knowing that the simulated data is Negative Binomial, why does the overlaying distribution provide such a poor fit to the data? What did I do wrong?

enter image description here

  • a dataset that I believe follows a “Negative Binomial” distribution. How, why?

    – 

  • My real data is a count data and based on historical data I know that data follows a Negative Binomial Dist. But plz forget about my real data and tell me why Negative Binomial is a poor fit on simulated “Negative Binomial” data?

    – 




  • Usually, when you are interested in the distribution, you are interested in the distribution conditional on predictors. In particular, you would care about a negative binomial distribution if you intend to fit a generalized linear model (GLM) with the negative-binomial distribution family. Fitting the distribution to the data is not helpful for that purpose. So, why are you doing this?

    – 

  • non-integer x: hist(xx$negative_binomial, prob = TRUE, col = "lightblue", breaks = 18L); curve(dnbinom(x, mu = fit$estimate[2L], size = fit$estimate[1L]), 0L, 40L, col = "red", add = TRUE) issues warnings that say: “~you are treating a discrete distribution as continuous”.

    – 




Leave a Comment