## Bernoulli

• Bernoulli
• Volume 23, Number 3 (2017), 1599-1630.

### A nonparametric two-sample hypothesis testing problem for random graphs

#### Abstract

We consider the problem of testing whether two independent finite-dimensional random dot product graphs have generating latent positions that are drawn from the same distribution, or distributions that are related via scaling or projection. We propose a test statistic that is a kernel-based function of the estimated latent positions obtained from the adjacency spectral embedding for each graph. We show that our test statistic using the estimated latent positions converges to the test statistic obtained using the true but unknown latent positions and hence that our proposed test procedure is consistent across a broad range of alternatives. Our proof of consistency hinges upon a novel concentration inequality for the suprema of an empirical process in the estimated latent positions setting.

#### Article information

Source
Bernoulli, Volume 23, Number 3 (2017), 1599-1630.

Dates
Revised: November 2015
First available in Project Euclid: 17 March 2017

https://projecteuclid.org/euclid.bj/1489737619

Digital Object Identifier
doi:10.3150/15-BEJ789

Mathematical Reviews number (MathSciNet)
MR3624872

Zentralblatt MATH identifier
06714313

#### Citation

Tang, Minh; Athreya, Avanti; Sussman, Daniel L.; Lyzinski, Vince; Priebe, Carey E. A nonparametric two-sample hypothesis testing problem for random graphs. Bernoulli 23 (2017), no. 3, 1599--1630. doi:10.3150/15-BEJ789. https://projecteuclid.org/euclid.bj/1489737619

#### References

• [1] Airoldi, E.M., Blei, D.M., Fienberg, S.E. and Xing, E.P. (2008). Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9 1981–2014.
• [2] Alba Fernández, V., Jiménez Gamero, M.D. and Muñoz García, J. (2008). A test for the two-sample problem based on empirical characteristic functions. Comput. Statist. Data Anal. 52 3730–3748.
• [3] Anderson, N.H., Hall, P. and Titterington, D.M. (1994). Two-sample test statistics for measuring discrepancies between two multivariate probability density functions using kernel-based density estimates. J. Multivariate Anal. 50 41–54.
• [4] Baringhaus, L. and Henze, N. (1988). A consistent test for multivariate normality based on the empirical characteristic function. Metrika 35 339–348.
• [5] Borgwardt, K.M., Ong, C.S., Schonauer, S., Vishwanathan, S.V.N., Smola, A.J. and Kriegel, H.-P. (2005). Protein function prediction via graph kernels. Bioinformatics 21 47–56.
• [6] Davis, C. and Kahan, W.M. (1970). The rotation of eigenvectors by a perturbation. III. SIAM J. Numer. Anal. 7 1–46.
• [7] Diaconis, P. and Janson, S. (2008). Graph limits and exchangeable random graphs. Rend. Mat. Appl. (7) 28 33–61.
• [8] Dobson, P.D. and Doig, A.J. (2003). Distinguishing enzyme structures from non-enzymes without alignments. J. Mol. Biol. 330 771–781.
• [9] Dudley, R.M. (1999). Uniform Central Limit Theorems. Cambridge Studies in Advanced Mathematics 63. Cambridge: Cambridge Univ. Press.
• [10] Fishkind, D.E., Shen, C. and Priebe, C.E. (2015). On the incommensurability phenomenon. J. Classification. To appear.
• [11] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B. and Smola, A. (2012). A kernel two-sample test. J. Mach. Learn. Res. 13 723–773.
• [12] Hall, P., Lombard, F. and Potgieter, C.J. (2013). A new approach to function-based hypothesis testing in location-scale families. Technometrics 55 215–223.
• [13] Harchaoui, Z., Bach, F., Cappé, O. and Moulines, E. (2013). Kernel-based methods for hypothesis testing: A unified view. IEEE Signal Process. Mag. 30 87–97.
• [14] Hoff, P.D., Raftery, A.E. and Handcock, M.S. (2002). Latent space approaches to social network analysis. J. Amer. Statist. Assoc. 97 1090–1098.
• [15] Holland, P.W., Laskey, K.B. and Leinhardt, S. (1983). Stochastic blockmodels: First steps. Soc. Netw. 5 109–137.
• [16] Karrer, B. and Newman, M.E.J. (2011). Stochastic blockmodels and community structure in networks. Phys. Rev. E (3) 83 016107, 10.
• [17] Lei, J. and Rinaldo, A. (2015). Consistency of spectral clustering in stochastic block models. Ann. Statist. 43 215–237.
• [18] Lu, L. and Peng, X. (2013). Spectra of edge-independent random graphs. Electron. J. Combin. 20 Paper 27, 18.
• [19] Lyons, R. (2013). Distance covariance in metric spaces. Ann. Probab. 41 3284–3305.
• [20] Lyzinski, V., Sussman, D.L., Tang, M., Athreya, A. and Priebe, C.E. (2014). Perfect clustering for stochastic blockmodel graphs via adjacency spectral embedding. Electron. J. Stat. 8 2905–2922.
• [21] Maa, J.-F., Pearl, D.K. and Bartoszyński, R. (1996). Reducing multidimensional two-sample data to one-dimensional interpoint comparisons. Ann. Statist. 24 1069–1074.
• [22] Oliveira, R.I. (2009). Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges. Preprint. Available at arXiv:0911.0600.
• [23] Pinelis, I. (1994). Optimum bounds for the distributions of martingales in Banach spaces. Ann. Probab. 22 1679–1706.
• [24] Sejdinovic, D., Sriperumbudur, B., Gretton, A. and Fukumizu, K. (2013). Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Ann. Statist. 41 2263–2291.
• [25] Sriperumbudur, B.K., Fukumizu, K. and Lanckriet, G.R.G. (2011). Universality, characteristic kernels and RKHS embedding of measures. J. Mach. Learn. Res. 12 2389–2410.
• [26] Steinwart, I. (2002). On the influence of the kernel on the consistency of support vector machines. J. Mach. Learn. Res. 2 67–93.
• [27] Steinwart, I. and Christmann, A. (2008). Support Vector Machines. Information Science and Statistics. New York: Springer.
• [28] Sussman, D.L., Tang, M., Fishkind, D.E. and Priebe, C.E. (2012). A consistent adjacency spectral embedding for stochastic blockmodel graphs. J. Amer. Statist. Assoc. 107 1119–1128.
• [29] Sussman, D.L., Tang, M. and Priebe, C.E. (2014). Consistent latent position estimation and vertex classification for random dot product graphs. IEEE Trans. Pattern Anal. Mach. Intell. 36 48–57.
• [30] Székely, G.J. and Rizzo, M.L. (2009). Brownian distance covariance. Ann. Appl. Stat. 3 1236–1265.
• [31] Székely, G.J. and Rizzo, M.L. (2013). Energy statistics: A class of statistics based on distances. J. Statist. Plann. Inference 143 1249–1272.
• [32] Tang, M., Athreya, A., Sussman, D.L., Lyzinski, V. and Priebe, C.E. (2014). A semiparametric two-sample hypothesis testing problem for random dot product graphs. Preprint. Available at arXiv:1403.7249.
• [33] van de Geer, S.A. (2000). Empirical Processes in $M$-Estimation. Cambridge: Cambridge Univ. Press.
• [34] van der Vaart, A.W. and Wellner, J.A. (1996). Weak Convergence and Empirical Processes: With Applications to Statistics. Springer Series in Statistics. New York: Springer.
• [35] Wolfe, P.J. and Olhede, S.C. (2013). Nonparametric graphon estimation. Preprint. Available at arXiv:1309.5936.
• [36] Yang, J.J., Han, Q. and Airoldi, E.M. (2014). Nonparametric estimation and testing of exchangeable graph models. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics 1060–1067. Reykjavik, Iceland.
• [37] Young, S.J. and Scheinerman, E.R. (2007). Random dot product graph models for social networks. In Algorithms and Models for the Web-Graph. Lecture Notes in Computer Science 4863 138–149. Berlin: Springer.