## Bernoulli

• Bernoulli
• Volume 23, Number 2 (2017), 951-989.

### Two-sample smooth tests for the equality of distributions

#### Abstract

This paper considers the problem of testing the equality of two unspecified distributions. The classical omnibus tests such as the Kolmogorov–Smirnov and Cramér–von Mises are known to suffer from low power against essentially all but location-scale alternatives. We propose a new two-sample test that modifies the Neyman’s smooth test and extend it to the multivariate case based on the idea of projection pursue. The asymptotic null property of the test and its power against local alternatives are studied. The multiplier bootstrap method is employed to compute the critical value of the multivariate test. We establish validity of the bootstrap approximation in the case where the dimension is allowed to grow with the sample size. Numerical studies show that the new testing procedures perform well even for small sample sizes and are powerful in detecting local features or high-frequency components.

#### Article information

Source
Bernoulli, Volume 23, Number 2 (2017), 951-989.

Dates
Revised: September 2015
First available in Project Euclid: 4 February 2017

https://projecteuclid.org/euclid.bj/1486177389

Digital Object Identifier
doi:10.3150/15-BEJ766

Mathematical Reviews number (MathSciNet)
MR3606756

Zentralblatt MATH identifier
1380.62202

#### Citation

Zhou, Wen-Xin; Zheng, Chao; Zhang, Zhen. Two-sample smooth tests for the equality of distributions. Bernoulli 23 (2017), no. 2, 951--989. doi:10.3150/15-BEJ766. https://projecteuclid.org/euclid.bj/1486177389

#### References

• [1] Baringhaus, L. and Franz, C. (2004). On a new multivariate two-sample test. J. Multivariate Anal. 88 190–206.
• [2] Baringhaus, L. and Franz, C. (2010). Rigid motion invariant two-sample tests. Statist. Sinica 20 1333–1361.
• [3] Barrett, G.F. and Donald, S.G. (2003). Consistent tests for stochastic dominance. Econometrica 71 71–104.
• [4] Bera, A.K. and Ghosh, A. (2002). Neyman’s smooth test and its applications in econometrics. In Handbook of Applied Econometrics and Statistical Inference (A. Ullah, A.T.K. Wan and A. Chaturvedi, eds.). Statist. Textbooks Monogr. 165 177–230. New York: Dekker.
• [5] Bera, A.K., Ghosh, A. and Xiao, Z. (2013). A smooth test for the equality of distributions. Econometric Theory 29 419–446.
• [6] Biswas, M. and Ghosh, A.K. (2014). A nonparametric two-sample test applicable to high dimensional data. J. Multivariate Anal. 123 160–171.
• [7] Bousquet, O. (2003). Concentration inequalities for sub-additive functions using the entropy method. In Stochastic Inequalities and Applications. Progress in Probability 56 213–247. Basel: Birkhäuser.
• [8] Cai, T.T., Liu, W. and Xia, Y. (2014). Two-sample test of high dimensional means under dependence. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 349–372.
• [9] Cattaneo, M.D. and Farrell, M.H. (2013). Optimal convergence rates, Bahadur representation, and asymptotic normality of partitioning estimators. J. Econometrics 174 127–143.
• [10] Chang, J., Zhou, W. and Zhou, W.-X. (2014). Simulation-based hypothesis testing of high dimensional means under covariance heterogeneity. Available at arXiv:1406.1939.
• [11] Chernozhukov, V., Chetverikov, D. and Kato, K. (2013). Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. Ann. Statist. 41 2786–2819.
• [12] Chernozhukov, V., Chetverikov, D. and Kato, K. (2014). Gaussian approximation of suprema of empirical processes. Ann. Statist. 42 1564–1597.
• [13] Chernozhukov, V., Chetverikov, D. and Kato, K. (2014). Anti-concentration and honest, adaptive confidence bands. Ann. Statist. 42 1787–1818.
• [14] Darling, D.A. (1957). The Kolmogorov–Smirnov, Cramér–von Mises tests. Ann. Math. Statist 28 823–838.
• [15] de Boor, C. (1978). A Practical Guide to Splines. Applied Mathematical Sciences 27. New York: Springer.
• [16] de la Peña, V.H., Lai, T.L. and Shao, Q.-M. (2009). Self-Normalized Processes: Limit Theory and Statistical Applications. Probability and Its Applications (New York). Berlin: Springer.
• [17] Dudley, R.M. (1979). Balls in $\textbf{R}^{k}$ do not cut all subsets of $k+2$ points. Adv. in Math. 31 306–308.
• [18] Dudley, R.M. (1999). Uniform Central Limit Theorems. Cambridge Studies in Advanced Mathematics 63. Cambridge: Cambridge Univ. Press.
• [19] Escanciano, J.C. (2009). On the lack of power of omnibus specification tests. Econometric Theory 25 162–194.
• [20] Eubank, R.L. and LaRiccia, V.N. (1992). Asymptotic comparison of Cramér–von Mises and nonparametric function estimation techniques for testing goodness-of-fit. Ann. Statist. 20 2071–2086.
• [21] Fan, J. (1996). Test of significance based on wavelet thresholding and Neyman’s truncation. J. Amer. Statist. Assoc. 91 674–688.
• [22] Friedman, J.H. and Rafsky, L.C. (1979). Multivariate generalizations of the Wald–Wolfowitz and Smirnov two-sample tests. Ann. Statist. 7 697–717.
• [23] Ghosal, S., Sen, A. and van der Vaart, A.W. (2000). Testing monotonicity of regression. Ann. Statist. 28 1054–1082.
• [24] Giné, E. and Nickl, R. (2009). An exponential inequality for the distribution function of the kernel density estimator, with applications to adaptive estimation. Probab. Theory Related Fields 143 569–596.
• [25] Hall, P. and Tajvidi, N. (2002). Permutation tests for equality of distributions in high-dimensional settings. Biometrika 89 359–374.
• [26] Hansen, B.E. (1996). Inference when a nuisance parameter is not identified under the null hypothesis. Econometrica 64 413–430.
• [27] Henze, N. (1988). A multivariate two-sample test based on the number of nearest neighbor type coincidences. Ann. Statist. 16 772–783.
• [28] Heyde, C.C. (1963). On a property of the lognormal distribution. J. Roy. Statist. Soc. Ser. B 25 392–393.
• [29] Horowitz, J.L. (1992). A smoothed maximum score estimator for the binary response model. Econometrica 60 505–531.
• [30] Inglot, T., Kallenberg, W.C.M. and Ledwina, T. (1997). Data driven smooth tests for composite hypotheses. Ann. Statist. 25 1222–1250.
• [31] Janssen, A. (2000). Global power functions of goodness of fit tests. Ann. Statist. 28 239–253.
• [32] Kosorok, M.R. (2008). Introduction to Empirical Processes and Semiparametric Inference. Springer Series in Statistics. New York: Springer.
• [33] Lai, T.L., Shao, Q.-M. and Wang, Q. (2011). Cramér type moderate deviations for Studentized ${U}$-statistics. ESAIM Probab. Stat. 15 168–179.
• [34] Lehmann, E.L. and Romano, J.P. (2005). Testing Statistical Hypotheses, 3rd ed. Springer Texts in Statistics. New York: Springer.
• [35] Mallat, S. (1998). A Wavelet Tour of Signal Processing. San Diego, CA: Academic Press.
• [36] Massart, P. (1990). The tight constant in the Dvoretzky–Kiefer–Wolfowitz inequality. Ann. Probab. 18 1269–1283.
• [37] Neumeyer, N. (2004). A central limit theorem for two-sample $U$-processes. Statist. Probab. Lett. 67 73–85.
• [38] Neyman, J. (1937). Smooth test for goodness of fit. Skand. Aktuarietidskr. 20 150–199.
• [39] Nolan, D. and Pollard, D. (1987). $U$-processes: Rates of convergence. Ann. Statist. 15 780–799.
• [40] Rosenbaum, P.R. (2005). An exact distribution-free test comparing two multivariate distributions based on adjacency. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 515–530.
• [41] Sansone, G. (1959). Orthogonal Functions. New York: Interscience.
• [42] Schilling, M.F. (1986). Multivariate two-sample tests based on nearest neighbors. J. Amer. Statist. Assoc. 81 799–806.
• [43] Shadrin, A. (1992). Interpolation with Lagrange polynomials. A simple proof of Markov inequality and some of its generalizations. Approx. Theory Appl. 8 51–61.
• [44] Shao, Q.-M. and Zhou, W.-X. (2016). Cramér type moderate derivation theorems for self-normalized processes. Bernoulli 22 2029–2079.
• [45] Sherman, R.P. (1994). Maximal inequalities for degenerate $U$-processes with applications to optimization estimators. Ann. Statist. 22 439–459.
• [46] van der Vaart, A.W. and Wellner, J.A. (1996). Weak Convergence and Empirical Processes: With Applications to Statistics. Springer Series in Statistics. New York: Springer.
• [47] Weiss, L. (1960). Two-sample tests for multivariate distributions. Ann. Math. Statist. 31 159–164.