## Bernoulli

• Bernoulli
• Volume 25, Number 3 (2019), 1816-1837.

### A one-sample test for normality with kernel methods

#### Abstract

We propose a new one-sample test for normality in a Reproducing Kernel Hilbert Space (RKHS). Namely, we test the null-hypothesis of belonging to a given family of Gaussian distributions. Hence, our procedure may be applied either to test data for normality or to test parameters (mean and covariance) if data are assumed Gaussian. Our test is based on the same principle as the MMD (Maximum Mean Discrepancy) which is usually used for two-sample tests such as homogeneity or independence testing. Our method makes use of a special kind of parametric bootstrap (typical of goodness-of-fit tests) which is computationally more efficient than standard parametric bootstrap. Moreover, an upper bound for the Type-II error highlights the dependence on influential quantities. Experiments illustrate the practical improvement allowed by our test in high-dimensional settings where common normality tests are known to fail. We also consider an application to covariance rank selection through a sequential procedure.

#### Article information

Source
Bernoulli, Volume 25, Number 3 (2019), 1816-1837.

Dates
Revised: March 2018
First available in Project Euclid: 12 June 2019

https://projecteuclid.org/euclid.bj/1560326429

Digital Object Identifier
doi:10.3150/18-BEJ1037

Mathematical Reviews number (MathSciNet)
MR3961232

Zentralblatt MATH identifier
07066241

#### Citation

Kellner, Jérémie; Celisse, Alain. A one-sample test for normality with kernel methods. Bernoulli 25 (2019), no. 3, 1816--1837. doi:10.3150/18-BEJ1037. https://projecteuclid.org/euclid.bj/1560326429

#### References

• [1] Aronszajn, N. (1950). Theory of reproducing kernels. Trans. Amer. Math. Soc. 68 337–404.
• [2] Bien, J. and Tibshirani, R.J. (2011). Sparse estimation of a covariance matrix. Biometrika 98 807–820.
• [3] Blanchard, G., Sugiyama, M., Kawanabe, M., Spokoiny, V. and Muller, K.-R. (2006). Non-Gaussian component analysis: A semi-parametric framework for linear dimension reduction. In NIPS.
• [4] Bouveyron, C., Fauvel, M. and Girard, S. (2015). Kernel discriminant analysis and clustering with parsimonious Gaussian process models. Stat. Comput. 25 1143–1162.
• [5] Brunel, É., Mas, A. and Roche, A. (2016). Non-asymptotic adaptive prediction in functional linear models. J. Multivariate Anal. 143 208–232.
• [6] Burke, M.D. (2000). Multivariate tests-of-fit and uniform confidence bands using a weighted bootstrap. Statist. Probab. Lett. 46 13–20.
• [7] Cardot, H. and Johannes, J. (2010). Thresholding projection estimators in functional linear models. J. Multivariate Anal. 101 395–408.
• [8] Choi, Y., Taylor, J. and Tibshirani, R. (2017). Selecting the number of principal components: Estimation of the true rank of a noisy matrix. Ann. Statist. 45 2590–2617.
• [9] Christmann, A. and Steinwart, I. (2010). Universal kernels on non-standard input spaces. In Advances in Neural Information Processing Systems 406–414.
• [10] Cuesta-Albertos, J.A., del Barrio, E., Fraiman, R. and Matrán, C. (2007). The random projection method in goodness of fit for functional data. Comput. Statist. Data Anal. 51 4814–4831.
• [11] Diederichs, E., Juditsky, A., Nemirovski, A. and Spokoiny, V. (2013). Sparse non Gaussian component analysis by semidefinite programming. Mach. Learn. 91 211–238.
• [12] Diederichs, E., Juditsky, A., Spokoiny, V. and Schütte, C. (2010). Sparse non-Gaussian component analysis. IEEE Trans. Inform. Theory 56 3033–3047.
• [13] Frigyik, B.A., Srivastava, S. and Gupta, M.R. (2008). An introduction to functional derivatives. Technical Report, Dept. Electr. Eng., Univ. Washington, Seattle, WA.
• [14] Fukumizu, K., Gretton, A., Sun, X. and Schölkopf, B. (2007). Kernel measures of conditional dependence. In NIPS 20 489–496.
• [15] Fukumizu, K., Sriperumbudur, B., Gretton, A. and Schölkopf, B. (2009). Characteristic Kernels on groups and semigroups. In NIPS.
• [16] Gretton, A., Borgwardt, K., Rasch, M., Schoelkopf, B. and Smola, A. (2007). A kernel method for the two-sample-problem. In Advances in Neural Information Processing Systems (B. Schoelkopf, J. Platt and T. Hoffman, eds.) 19 513–520. Cambridge: MIT Press.
• [17] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B. and Smola, A. (2012). A kernel two-sample test. J. Mach. Learn. Res. 13 723–773.
• [18] Gretton, A., Fukumizu, K., Harchaoui, Z. and Sriperumbudur, B.K. (2009). A fast, consistent kernel two-sample test. In NIPS.
• [19] Gretton, A., Fukumizu, K., Teo, C.H., Song, L., Schölkopf, B. and Smola, A.J. (2007). A kernel statistical test of independence. In NIPS 21.
• [20] Gretton, A., Sejdinovic, D., Strathmann, H., Balakrishnan, S., Pontil, M., Fukumizu, K. and Sriperumbudur, B.K. (2012). Optimal kernel choice for large-scale two-sample tests. In Advances in Neural Information Processing Systems 1205–1213.
• [21] Henze, N. and Zirkler, B. (1990). A class of invariant consistent tests for multivariate normality. Comm. Statist. Theory Methods 19 3595–3617.
• [22] Hoffmann-Jørgensen, J. and Pisier, G. (1976). The law of large numbers and the central limit theorem in Banach spaces. Ann. Probab. 4 587–599.
• [23] Josse, J. and Husson, F. (2012). Selecting the number of components in principal component analysis using cross-validation approximations. Comput. Statist. Data Anal. 56 1869–1879.
• [24] Kellner, J. and Celisse, A. (2019). Supplement to “A one-sample test for normality with kernel methods.” DOI:10.3150/18-BEJ1037SUPP.
• [25] Kojadinovic, I. and Yan, J. (2012). Goodness-of-fit testing based on a weighted bootstrap: A fast large-sample alternative to the parametric bootstrap. Canad. J. Statist. 40 480–500.
• [26] Lehmann, E.L. and Romano, J.P. (2005). Testing Statistical Hypotheses, 3rd ed. New York: Springer.
• [27] Mardia, K.V. (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika 57 519–530.
• [28] Ratsimalahelo, Z. (2003). Strongly consistent determination of the rank of matrix. Econometrics.
• [29] Reddi, S., Ramdas, A., Poczos, B., Singh, A. and Wasserman, L. (2015). On the high dimensional power of a linear-time two sample test under mean-shift alternatives. J. Mach. Learn. Res. 772–780.
• [30] Robin, J.-M. and Smith, R.J. (2000). Tests of rank. Econometric Theory 16 151–175.
• [31] Roth, V. (2006). Kernel Fisher discriminants for outlier detection. Neural Comput. 18 942–960.
• [32] Roweis, S. (1998). EM algorithms for PCA and SPCA. In Advances in Neural Information Processing Systems 626–632. MIT Press.
• [33] Schölkopf, B., Smola, A. and Müller, K.-R. (1997). Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10 1299–1319.
• [34] Sejdinovic, D., Sriperumbudur, B., Gretton, A. and Fukumizu, K. (2013). Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Ann. Statist. 41 2263–2291.
• [35] Sriperumbudur, B.K., Gretton, A., Fukumizu, K., Schölkopf, B. and Lanckriet, G.R.G. (2010). Hilbert space embeddings and metrics on probability measures. J. Mach. Learn. Res. 11 1517–1561.
• [36] Srivastava, M.S., Katayama, S. and Kano, Y. (2013). A two sample test in high dimensional data. J. Multivariate Anal. 114 349–358.
• [37] Stute, W., Gonzáles Manteiga, W. and Presedo Quindimil, M. (1993). Bootstrap based goodness-of-fit tests. Metrika 40 243–256.
• [38] Svantesson, T. and Wallace, J.W. (2003). Tests for assessing multivariate normality and the covariance structure of MIMO data. In Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP’03). 2003 IEEE International Conference on 4 IV–656. IEEE.
• [39] Székely, G.J. and Rizzo, M.L. (2005). A new test for multivariate normality. J. Multivariate Anal. 93 58–80.
• [40] Zwald, L. (2005). Performances d’Algorithmes Statistiques d’Apprentissage: “Kernel Projection Machine” et Analyse en Composantes Principales à Noyaux. Ph.D. thesis, Université Paris XI U.F.R. Scientifique d’Orsay.

#### Supplemental materials

• Supplement to “A one-sample test for normality with kernel methods”. The supplemental article [24] to this article features appendix sections. In Appendix A, normality tests mentioned throughout this article (such as Henze–Zirkler or Energy distance) are briefly introduced. In Appendix B, the proofs of the theorems presented in this article are detailed. Appendix C shows additional experiments. Finally, Appendix D explicitly shows closed-forms expressions for the Fréchet derivative of $N[\theta]$ for practitioners.