For the above Hypothesis, I calculate the analytical p-value (which is the true value ). I am trying to simulate the result using Monte Carlo simulation but I am not getting any closer value. What should be the correct code for simulated/empirical p-value?
mu_1 = 8.2
mu_2 = 10.5
set.seed(212)
n=100
y_sample <- rnorm(n, mean = (mu_1 + mu_2) / 2, sd = 3)
x_bar <- mean(y_sample)
sigma <- sd(y_sample)
# Calculate true p-value analytically
z_1 = (x_bar - mu_1) / (sigma / sqrt(n))
z_2 = (x_bar - mu_2) / (sigma / sqrt(n))
analytical_p_value = ifelse( z_1 + z_2 < 0, pnorm(-z_1) + pnorm(-z_2)-1, pnorm(z_1) + pnorm(z_2)-1 )
I am not sure how to modify the following part:
# Simulate N=10000 sets of data of size n, assuming the null is true
N=10000
t <- rep(0, N)
for (j in 1:N) {
z_1 <- rnorm(n, mu_1, sigma)
z_2 <- rnorm(n, mu_2, sigma)
z = (z_1 + z_2)/2
t[j] <- abs( mean(z) - (mu_1+mu_2)/2 )
}
# Calculate empirical p-value
empirical_p_value = sum(t > abs(x_bar - (mu_1+mu_2)/2 )) / N
> analytical_p_value
[1] 0.001078676
> empirical_p_value
[1] 0.2986
It would be a bit helpful if you provide a reference for the analytical derivation. Next, I might record a few of the means and sd’s you get via each simulation to see if they are at least close to each other, then graph
y_sample
and thez_1
andz_2
from each simulation to verify they are what you want.