I have a tibble with an id
column, a G
grouping variable, and 300 numeric variables.
I want a method that clusters the raws to the point that each row is matched/paired in a cluster with another within each grouping variable. Spare raws in odd groups can be left out of the clusters.
So, if in a group there are 4 raws, then there will be 2 clusters of 2. If there are 5 raws, then 2 clusters of 2 and a spare raw.
I think I like the Mahalanobis distance for clustering but I am open to an alternative proposal.
I think that a diagnostic variable with the intra-cluster Mahalanobis could help, too.
Technically speaking, MatchIt
does something very similar, over-imposing a binary classification to the raws. I don’t want the need of such classification.