The Annals of Applied Statistics

Modeling within-household associations in household panel studies

Fiona Steele, Paul S. Clarke, and Jouni Kuha

Full-text: Open access


Household panel data provide valuable information about the extent of similarity in coresidents’ attitudes and behaviours. However, existing analysis approaches do not allow for the complex association structures that arise due to changes in household composition over time. We propose a flexible marginal modeling approach where the changing correlation structure between individuals is modeled directly and the parameters estimated using second-order generalized estimating equations (GEE2). A key component of our correlation model specification is the “superhousehold”, a form of social network in which pairs of observations from different individuals are connected (directly or indirectly) by coresidence. These superhouseholds partition observations into clusters with nonstandard and highly variable correlation structures. We thus conduct a simulation study to evaluate the accuracy and stability of GEE2 for these models. Our approach is then applied in an analysis of individuals’ attitudes towards gender roles using British Household Panel Survey data. We find strong evidence of between-individual correlation before, during and after coresidence, with large differences among spouses, parent–child, other family, and unrelated pairs. Our results suggest that these dependencies are due to a combination of nonrandom sorting and causal effects of coresidence.

Article information

Ann. Appl. Stat., Volume 13, Number 1 (2019), 367-392.

Received: September 2017
Revised: April 2018
First available in Project Euclid: 10 April 2019

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Household effects household correlation longitudinal households homophily multiple membership multilevel model marginal model generalised estimating equations


Steele, Fiona; Clarke, Paul S.; Kuha, Jouni. Modeling within-household associations in household panel studies. Ann. Appl. Stat. 13 (2019), no. 1, 367--392. doi:10.1214/18-AOAS1189.

Export citation


  • Atkins, D. C. (2005). Using multilevel models to analyze couple and family treatment data: Basic and advanced issues. J. Fam. Psychol. 19 98–110.
  • Ballas, D. and Tranmer, M. (2012). Happy people or happy places? A multilevel modeling approach to the analysis of happiness and well-being. Int. Reg. Sci. Rev. 35 70–102.
  • Bauer, D. J., Gottfredson, N. C., Dean, D. and Zucker, R. A. (2013). Analyzing repeated measures data on individuals nested within groups: Accounting for dynamic group effects. Psychol. Methods 18 1–14.
  • Berridge, D., Penn, R. and Ganjali, M. (2009). Changing attitudes to gender roles: A longitudinal analysis of ordinal response data from the British household panel study. Int. Sociol. 24 346–367.
  • Blackwell, D. L. and Lichter, D. T. (2004). Homogamy among dating, cohabiting and married couples. Sociol. Q. 45 719–737.
  • Brynin, M., Longhi, S. and Martínez Pérez, Á. (2008). The social significance of homogamy. 73–90 5. Routledge, New York.
  • Buck, N. and McFall, S. (2012). Understanding society: Design overview. Longitud. Life Course Stud. 3 5–17.
  • Butterworth, P. and Rodgers, B. (2006). Concordance in the mental health of spouses: Analysis of a large national household panel survey. Psychol. Med. 36 685–697.
  • Chaganty, N. R. (1997). An alternative approach to the analysis of longitudinal data via generalized estimating equations. J. Statist. Plann. Inference 63 39–54.
  • Chandola, T., Bartley, M., Wiggins, R. and Schofield, P. (2003). Social inequalities in health by individual and household measures of social position in a cohort of healthy people. J. Epidemiol. Community Health 57 56–62.
  • Chiu, T. Y. M., Leonard, T. and Tsui, K.-W. (1996). The matrix-logarithmic covariance model. J. Amer. Statist. Assoc. 91 198–210.
  • Crowder, M. (1995). On the use of a working correlation matrix in using generalised linear models for repeated measures. Biometrika 82 407–410.
  • Davillas, A. and Pudney, S. (2017). Concordance of health states in couples: Analysis of self-reported, nurse administered and blood-based biomarker data in the UK understanding society panel. J. Health Econ. 56 87–102.
  • Dempster, A. P. (1972). Covariance selection. Biometrics 28 157–175.
  • Duncan, G. and Hill, M. (1985). Conceptions of longitudinal households: Fertile or futile? J. Econ. Soc. Meas. 13 361–375.
  • Fowler, J. H. and Christakis, N. A. (2008). Dynamic spread of happiness in a large social network: Longitudinal analysis over 20 years in the framingham heart study. Br. Med. J. 337 a2338.
  • Gneiting, T. (2002). Nonseparable, stationary covariance functions for space–time data. J. Amer. Statist. Assoc. 97 590–600.
  • Goldstein, H. (2010). Multilevel Statistical Models, 4th ed. Wiley, London.
  • Goldstein, H., Rasbash, J., Browne, W. J., Woodhouse, G. and Poulain, M. (2000). Multilevel models in the study of dynamic household structures. Eur. J. Popul. 16 373–387.
  • Hardin, J. W. and Hilbe, J. M. (2013). Generalized Estimating Equations, 2nd ed. CRC Press, Boca Raton, FL.
  • Højsgaard, S., Halekoh, U. and Yan, J. (2006). The R package geepack for generalized estimating equations. J. Stat. Softw. 15 1–11.
  • Institute for Social and Economic Research (ISER) (2009). British Household Panel Survey: Waves 117, 19912008, 6th ed. Univ. Essex, Institute for Social and Economic Research [original data producer(s)], Colchester, Essex. UK Data Archive [distributor]. SN: 5151.
  • Jennrich, R. I. and Schluchter, M. D. (1986). Unbalanced repeated-measures models with structured covariance matrices. Biometrics 42 805–820.
  • Johnston, R., Jones, K., Propper, C., Sarker, R., Burgess, S. and Bolster, A. (2005). A missing level in the analyses of British voting behaviour: The household as context as shown by analyses of a 1992–1997 longitudinal survey. Elect. Stud. 24 201–225.
  • Jones, B. and West, M. (2005). Covariance decomposition in undirected Gaussian graphical models. Biometrika 92 779–786.
  • Kalmijn, M. (1998). Intermarriage and homogamy: Causes, patterns, trends. Annu. Rev. Sociol. 24 395–421.
  • Keizer, R. and Schenk, N. (2012). Becoming a parent and relationship satisfaction: A longitudinal dyadic perspective. J. Marriage Fam. 74 759–773.
  • Kuk, A. Y. C. (2007). A hybrid pairwise likelihood method. Biometrika 94 939–952.
  • Kuk, A. Y. C. and Nott, D. J. (2000). A pairwise likelihood approach to analysing correlated binary data. Statist. Probab. Lett. 47 329–335.
  • Leckie, G. and Goldstein, H. (2009). The limitations of using school league tables to inform school choice. J. Roy. Statist. Soc. Ser. A 172 835–851.
  • Liang, K. Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika 73 13–22.
  • Liang, K.-Y., Zeger, S. L. and Qaqish, B. (1992). Multivariate regression analyses for categorical data. J. Roy. Statist. Soc. Ser. B 54 3–40.
  • McPherson, M., Smith-Lovin, L. and Cook, J. M. (2001). Birds of a feather: Homophily in social networks. Annu. Rev. Sociol. 27 415–444.
  • Milner, A., Spittal, M. J., Page, A. and LaMontagne, A. D. (2014). The effect of leaving employment on mental health: Testing ‘adaptation’ versus ‘sensitisation’ in a cohort of working-age australians. Occup. Environ. Med. 71 167–174.
  • Murphy, M. J. (1996). The dynamic household as a logical concept and its use in demography. Eur. J. Popul. 12 363–381.
  • Pearson, M. and West, P. (2003). Drifting smoke rings: Social network analysis and Markov processes in a longitudinal study of friendship groups and risk-taking. Connections 25 59–76.
  • Pourahmadi, M. (1999). Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterisation. Biometrika 86 677–690.
  • Prentice, R. L. and Zhao, L. P. (1991). Estimating equations for parameters in means and covariances of multivariate discrete and continuous responses. Biometrics 47 825–839.
  • Raudenbush, S. W., Brennan, R. T. and Barnett, R. C. (1995). A multivariate hierarchical model for studying psychological change within married couples. J. Fam. Psychol. 9 161–174.
  • Robins, J. M., Rotnitzky, A. and Zhao, L. P. (1995). Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J. Amer. Statist. Assoc. 90 106–121.
  • Sacker, A., Wiggins, R. and Bartley, M. (2006). Time and place: Putting individual health into context. A multilevel analysis of the British household panel survey, 1991–2001. Health Place 12 279–290.
  • Shults, J. and Hilbe, J. M. (2014). Quasi-Least Squares Regression. Monographs on Statistics and Applied Probability 132. Chapman and Hall/CRC Press, Boca Raton, FL.
  • Steele, F., Clarke, P. S. and Kuha, J. (2019). Supplement to “Modeling within-household associations in household panel studies.” DOI:10.1214/18-AOAS1189SUPP.
  • Sweeting, H., Bhaskar, A., Benzeval, M., Popham, F. and Hunt, K. (2014). Changing gender roles and attitudes and their implications for well-being around the new millennium. Soc. Psychiatry Psychiatr. Epidemiol. 49 791–809.
  • Yan, J. and Fine, J. (2004). Estimating equations for association structures. Stat. Med. 23 859–874.
  • Ziegler, A., Kastner, C. and Blettner, M. (1998). The generalised estimating equations: An annotated bibliography. Biom. J. 40 115–139.

Supplemental materials

  • Supplementary information, analysis and code. The supplement includes descriptive analysis of events leading to household change, further details on superhouseholds, Stata code for the construction of superhousehold IDs, additional simulation results, a discussion on positive definite correlation matrices, details on data structures and model estimation in R using geepack.