How to Calculate a 2D Empirical CDF via histogram2d

Question

I am trying to obtain a matrix representation of an empirical 2 dimensional CDF given two data samples of the same size.

I have two sorted data samples of the same size: sorted_sample1 and sorted_sample2 . I want a matrix which represents their 2 dimensional empirical cdf. Currently I have

hist_values, x_edges, y_edges = np.histogram2d(sorted_sample1, sorted_sample2,density = True, bins=[num_bins_x, num_bins_y], range=bin_range)

This gives me a 2 dimensional empirical PDF. If this were 1D then I would apply np.sumsum to my empirical pdf and then normalise in order to get the CDF. However, I’m not sure what to do for the 2D case. The two options I’ve come up with are:

np.cumsum along one axis and then along the other. But this feels wrong and I’m wondering whether the order of axis matters: cumsum_x = np.cumsum(hist_values1, axis=0) followed by cumsum1_xy = np.cumsum(cumsum1_x, axis=1) . Then I can normalise
unravel my 2D matrix and apply cumsum to that, and then reshape it: `cumsum = np.cumsum(hist_values.ravel()).reshape(hist_values1.shape) . Then I can normalise

Is either of these options correct and if not what is? Thanks

Leave a Comment Cancel reply