The categorical Gini correlation proposed by Dang et al.  is a dependence measure to characterize independence between categorical and numerical variables. The asymptotic distributions of the sample correlation under dependence and independence have been established when the dimension of the numerical variable is fixed. However, its asymptotic behavior for high dimensional data has not been explored. In this paper, we develop the central limit theorem for the Gini correlation in the more realistic setting where the dimensionality of the numerical variable is diverging. We then construct a powerful and consistent test for the K-sample problem based on the asymptotic normality. The proposed test not only avoids computation burden but also gains power over the permutation procedure. Simulation studies and real data illustrations show that the proposed test is more competitive to existing methods across a broad range of realistic situations, especially in unbalanced cases.
Thanks Jun Li for sharing R codes on two-sample tests with us.
"Asymptotic normality of Gini correlation in high dimension with applications to the K-sample problem." Electron. J. Statist. 17 (2) 2539 - 2574, 2023. https://doi.org/10.1214/23-EJS2165