## The Annals of Applied Statistics

### Prediction models for network-linked data

#### Abstract

Prediction algorithms typically assume the training data are independent samples, but in many modern applications samples come from individuals connected by a network. For example, in adolescent health studies of risk-taking behaviors, information on the subjects’ social network is often available and plays an important role through network cohesion, the empirically observed phenomenon of friends behaving similarly. Taking cohesion into account in prediction models should allow us to improve their performance. Here we propose a network-based penalty on individual node effects to encourage similarity between predictions for linked nodes, and show that incorporating it into prediction leads to improvement over traditional models both theoretically and empirically when network cohesion is present. The penalty can be used with many loss-based prediction methods, such as regression, generalized linear models, and Cox’s proportional hazard model. Applications to predicting levels of recreational activity and marijuana usage among teenagers from the AddHealth study based on both demographic covariates and friendship networks are discussed in detail and show that our approach to taking friendships into account can significantly improve predictions of behavior while providing interpretable estimates of covariate effects.

#### Article information

Source
Ann. Appl. Stat., Volume 13, Number 1 (2019), 132-164.

Dates
Revised: June 2018
First available in Project Euclid: 10 April 2019

https://projecteuclid.org/euclid.aoas/1554861644

Digital Object Identifier
doi:10.1214/18-AOAS1205

Mathematical Reviews number (MathSciNet)
MR3937424

Zentralblatt MATH identifier
07057423

#### Citation

Li, Tianxi; Levina, Elizaveta; Zhu, Ji. Prediction models for network-linked data. Ann. Appl. Stat. 13 (2019), no. 1, 132--164. doi:10.1214/18-AOAS1205. https://projecteuclid.org/euclid.aoas/1554861644

#### References

• Abbe, E. (2017). Community detection and stochastic block models: Recent developments. J. Mach. Learn. Res. 18 Paper No. 177, 86.
• Amini, A. A., Chen, A., Bickel, P. J. and Levina, E. (2013). Pseudo-likelihood methods for community detection in large sparse networks. Ann. Statist. 41 2097–2122.
• Asur, S. and Huberman, B. A. (2010). Predicting the future with social media. In Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on 1 492–499. IEEE, New York.
• Belkin, M. and Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15 1373–1396.
• Belkin, M., Niyogi, P. and Sindhwani, V. (2006). Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 7 2399–2434.
• Bengio, Y., Paiement, J.-F., Vincent, P., Delalleau, O., Le Roux, N. and Ouimet, M. (2004). Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering. Adv. Neural Inf. Process. Syst. 16 177–184.
• Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. J. Roy. Statist. Soc. Ser. B 36 192–236.
• Binkiewicz, N., Vogelstein, J. T. and Rohe, K. (2017). Covariate-assisted spectral clustering. Biometrika 104 361–377.
• Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. Cambridge Univ. Press, Cambridge.
• Bramoullé, Y., Djebbari, H. and Fortin, B. (2009). Identification of peer effects through social networks. J. Econometrics 150 41–55.
• Bühlmann, P. and Hothorn, T. (2007). Boosting algorithms: Regularization, prediction and model fitting. Statist. Sci. 22 477–505.
• Cai, D., He, X. and Han, J. (2007). Spectral regression: A unified approach for sparse subspace learning. In Seventh IEEE International Conference on Data Mining (ICDM 2007) 73–82. IEEE, New York.
• Chaudhuri, K., Graham, F. C. and Tsiatas, A. (2012). Spectral clustering of graphs with general degrees in the extended planted partition model. In COLT 23 35–1.
• Choi, D. (2017). Estimation of monotone treatment effects in network experiments. J. Amer. Statist. Assoc. 112 1147–1155.
• Christakis, N. A. and Fowler, J. H. (2007). The spread of obesity in a large social network over 32 years. N. Engl. J. Med. 357 370–379.
• Cohen, M. B., Kyng, R., Miller, G. L., Pachocki, J. W., Peng, R., Rao, A. B. and Xu, S. C. (2014). Solving SDD linear systems in nearly $m\log^{1/2}n$ time. In STOC’14—Proceedings of the 2014 ACM Symposium on Theory of Computing 343–352. ACM, New York.
• Cox, D. R. (1972). Regression models and life-tables. J. Roy. Statist. Soc. Ser. B 34 187–220.
• Cressie, N. (1990). The origins of kriging. Math. Geol. 22 239–252.
• Fujimoto, K. and Valente, T. W. (2012). Social network influences on adolescent substance use: Disentangling structural equivalence from cohesion. Soc. Sci. Med. 74 1952–1960.
• Goldenberg, A., Zheng, A. X., Fienberg, S. E. and Airoldi, E. M. (2010). A survey of statistical network models. Found. Trends Mach. Learn. 2 129–233.
• Hallac, D., Leskovec, J. and Boyd, S. (2015). Network lasso: Clustering and optimization in large graphs. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 387–396. ACM, New York.
• Harris, K. M. (2009). The National Longitudinal Study of Adolescent to Adult Health (Add Health), Waves I & II, 19941996; Wave III, 20012002; Wave IV, 20072009 [Machine-Readable Data File and Documentation]. Carolina Population Center, Univ. North Carolina at Chapel Hill, Chapel Hill.
• Haynie, D. L. (2001). Delinquent peers revisited: Does network structure matter? Amer. J. Sociol. 106 1013–1057.
• Henderson, C. R. (1953). Estimation of variance and covariance components. Biometrics 9 226–252.
• Hothorn, T., Buehlmann, P., Kneib, T., Schmid, M. and Hofner, B. (2018). mboost: Model-Based Boosting. R package version 2.9-0.
• Kim, S., Pan, W. and Shen, X. (2013). Network-based penalized regression with application to genomic data. Biometrics 69 582–593.
• Kolaczyk, E. D. (2009). Statistical Analysis of Network Data: Methods and Models. Springer, New York.
• Koutis, I., Miller, G. L. and Peng, R. (2010). Approaching optimality for solving SDD linear systems. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science—FOCS 2010 235–244. IEEE Computer Soc., Los Alamitos, CA.
• Land, S. R. and Friedman, J. H. (1997). Variable fusion: A new adaptive signal regression method. Technical Report 656, Department of Statistics, Carnegie Mellon Univ., Pittsburgh, PA.
• Le, C. M., Levina, E. and Vershynin, R. (2017). Concentration and regularization of random graphs. Random Structures Algorithms 51 538–561.
• Lee, L. (2007). Identification and estimation of econometric models with group interactions, contextual factors and fixed effects. J. Econometrics 140 333–374.
• Lee, D. (2013). CARBayes: An R package for Bayesian spatial modeling with conditional autoregressive priors. J. Stat. Softw. 55 1–24.
• Li, T., Levina, E. and Zhu, J. (2016). netcoh: Statistical Modeling with Network Cohesion. R package version 0.11.
• Li, T., Levina, E. and Zhu, J. (2019). Supplement to “Prediction models for network-linked data.” DOI:10.1214/18-AOAS1205SUPP.
• Li, C. and Li, H. (2008). Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics 24 1175–1182.
• Li, C. and Li, H. (2010). Variable selection and regression analysis for graph-structured covariates with an application to genomics. Ann. Appl. Stat. 4 1498–1516.
• Lin, X. (2010). Identifying peer effects in student academic achievement by spatial autoregressive models with group unobservables. J. Labor Econ. 28 825–860.
• Manski, C. F. (1993). Identification of endogenous social effects: The reflection problem. Rev. Econ. Stud. 60 531–542.
• Manski, C. F. (2013). Identification of treatment response with social interactions. Econom. J. 16 S1–S23.
• Michell, L. and West, P. (1996). Peer pressure to smoke: The meaning depends on the method. Health Educ. Res. 11 39–49.
• Newman, M. E. J. and Clauset, A. (2016). Structure and inference in annotated networks. Nat. Commun. 7 11863.
• Pan, W., Xie, B. and Shen, X. (2010). Incorporating predictor network in penalized regression with application to microarray data. Biometrics 66 474–484.
• Pearson, M. and Michell, L. (2000). Smoke rings: Social network analysis of friendship groups, smoking and drug-taking. Drugs Educ. Prev. Policy 7 21–37.
• Pearson, M. and West, P. (2003). Drifting smoke rings. Connections 25 59–76.
• Phan, T. Q. and Airoldi, E. M. (2015). A natural experiment of social network formation and dynamics. Proc. Natl. Acad. Sci. USA 112 6595–6600.
• Raducanu, B. and Dornaika, F. (2012). A supervised non-linear dimensionality reduction approach for manifold learning. Pattern Recognit. 45 2432–2444.
• Rand, D. G., Arbesman, S. and Christakis, N. A. (2011). Dynamic social networks promote cooperation in experiments with humans. Proc. Natl. Acad. Sci. USA 108 19193–19198.
• Rue, H. and Held, L. (2005). Gaussian Markov Random Fields: Theory and Applications. Monographs on Statistics and Applied Probability 104. CRC Press/CRC, Boca Raton, FL.
• Sadhanala, V., Wang, Y.-X. and Tibshirani, R. J. (2016). Graph sparsification approaches for Laplacian smoothing. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics 1250–1259.
• Searle, S. R., Casella, G. and McCulloch, C. E. (1992). Variance Components. Wiley, New York.
• Shalizi, C. R. and Thomas, A. C. (2011). Homophily and contagion are generically confounded in observational social network studies. Sociol. Methods Res. 40 211–239.
• Sharpnack, J., Singh, A. and Krishnamurthy, A. (2013). Detecting activations over graphs using spanning tree wavelet bases. In Artificial Intelligence and Statistics 536–544.
• Shi, J. and Malik, J. (2000). Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22 888–905.
• Song, X. and Zhou, X.-H. (2008). A semiparametric approach for the covariate specific ROC curve with survival outcome. Statist. Sinica 18 947–965.
• Spielman, D. A. and Teng, S.-H. (2011). Spectral sparsification of graphs. SIAM J. Comput. 40 981–1025.
• Tenenbaum, J. B., De Silva, V. and Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science 290 2319–2323.
• Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 91–108.
• Vapnik, V. N. (1995). The Nature of Statistical Learning Theory. Springer, New York.
• Vogelstein, J. T., Roncal, W. G., Vogelstein, R. J. and Priebe, C. E. (2013). Graph classification using signal-subgraphs: Applications in statistical connectomics. IEEE Trans. Pattern Anal. Mach. Intell. 35 1539–1551.
• Vural, E. and Guillemot, C. (2016). Out-of-sample generalizations to supervised manifold learning for classification. IEEE Trans. Image Process. 25 1410–1424.
• Wahba, G. et al. (1999). Support vector machines, reproducing kernel Hilbert spaces and the randomized GACV. In Advances in Kernel Methods-Support Vector Learning 6 69–87.
• Waller, L. A. and Gotway, C. A. (2004). Applied Spatial Statistics for Public Health Data. Wiley, Hoboken, NJ.
• Wang, Y.-X., Sharpnack, J., Smola, A. J. and Tibshirani, R. J. (2016). Trend filtering on graphs. J. Mach. Learn. Res. 17 Paper No. 105, 41.
• Wolf, T., Schroter, A., Damian, D. and Nguyen, T. (2009). Predicting build failures using social network analysis on developer communication. In Proceedings of the 31st International Conference on Software Engineering 1–11. IEEE Comput. Soc., Los Alamitos, CA.
• Xu, Y., Dyer, J. S. and Owen, A. B. (2010). Empirical stationary correlations for semi-supervised learning on graphs. Ann. Appl. Stat. 4 589–614.
• Yang, W., Sun, C. and Zhang, L. (2011). A multi-manifold discriminant analysis method for image feature extraction. Pattern Recognit. 44 1649–1657.
• Zhang, Y., Levina, E. and Zhu, J. (2016). Community detection in networks with node features. Electron. J. Stat. 10 3153–3178.
• Zhou, D., Huang, J. and Schölkopf, B. (2005). Learning from labeled and unlabeled data on a directed graph. In Proceedings of the 22nd International Conference on Machine Learning 1036–1043. ACM, New York.
• Zhou, D., Bousquet, O., Lal, T. N., Weston, J. and Schölkopf, B. (2004). Learning with local and global consistency. In Advances in Neural Information Processing Systems 321–328.

#### Supplemental materials

• Supplement to “Prediction models for network-linked data”. We provide the proof of theoretical properties, computational complexity, additional simulation examples under logistic regression setting as well as sensitivity study of missing data imputation in the supplemental article.