How to Calculate a 2D Empirical CDF via histogram2d

I am trying to obtain a matrix representation of an empirical 2 dimensional CDF given two data samples of the same size.

I have two sorted data samples of the same size: sorted_sample1 and sorted_sample2 . I want a matrix which represents their 2 dimensional empirical cdf. Currently I have

hist_values, x_edges, y_edges = np.histogram2d(sorted_sample1, sorted_sample2,density = True, bins=[num_bins_x, num_bins_y], range=bin_range)

This gives me a 2 dimensional empirical PDF. If this were 1D then I would apply np.sumsum to my empirical pdf and then normalise in order to get the CDF. However, I’m not sure what to do for the 2D case. The two options I’ve come up with are:

  1. np.cumsum along one axis and then along the other. But this feels wrong and I’m wondering whether the order of axis matters: cumsum_x = np.cumsum(hist_values1, axis=0) followed by cumsum1_xy = np.cumsum(cumsum1_x, axis=1) . Then I can normalise

  2. unravel my 2D matrix and apply cumsum to that, and then reshape it: `cumsum = np.cumsum(hist_values.ravel()).reshape(hist_values1.shape) . Then I can normalise

    Is either of these options correct and if not what is? Thanks

  • What do do mean “sorted data samples of the same size”? This is 2D samples, how you could sort them? And have them sorted at the same time? Could you please provide sample of the data1 and data2?

    – 

  • @SeverinPappadeux both sets of samples are measurements of a network. They have the same length.

    – 

Leave a Comment