Electronic Journal of Statistics

Nonparametric distribution estimation in the presence of familial correlation and censoring

Kun Xu, Yanyuan Ma, and Yuanjia Wang

Full-text: Open access


We propose methods to estimate the distribution functions for multiple populations from mixture data that are only known to belong to a specific population with certain probabilities. The problem is motivated from kin-cohort studies collecting phenotype data in families for various diseases such as the Huntington’s disease (HD) or breast cancer. Relatives in these studies are not genotyped hence only their probabilities of carrying a known causal mutation (e.g., BRCA1 gene mutation or HD gene mutation) can be derived. In addition, phenotype observations from the same family may be correlated due to shared life style or other genes associated with disease, and the observations are subject to censoring. Our estimator does not assume any parametric form of the distributions, and does not require modeling of the correlation structure. It estimates the distributions through using the optimal base estimators and then optimally combine them. The optimality implies both estimation consistency and minimum estimation variance. Simulations and real data analysis on an HD study are performed to illustrate the improved efficiency of the proposed methods.

Article information

Electron. J. Statist., Volume 11, Number 1 (2017), 1928-1948.

Received: May 2016
First available in Project Euclid: 3 May 2017

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G08: Nonparametric regression
Secondary: 62N01: Censored data models

Bootstrap efficiency familial correlation Huntington’s disease mixed samples quadratic inference function

Creative Commons Attribution 4.0 International License.


Xu, Kun; Ma, Yanyuan; Wang, Yuanjia. Nonparametric distribution estimation in the presence of familial correlation and censoring. Electron. J. Statist. 11 (2017), no. 1, 1928--1948. doi:10.1214/17-EJS1274. https://projecteuclid.org/euclid.ejs/1493776838

Export citation


  • [1] Akritas, M. G. (1986). Bootstrapping the kaplan-meier estimator., Journal of the American Statistical Association 81 1032–1038.
  • [2] Breslow, N. and Crowley, J. (1974). A large sample study of the life table and product limit estimates under random censorship., The Annals of Statistics 2 437–453.
  • [3] Dorsey, E. and the Huntington Study Group COHORT Investigators (2012). Characterization of a large group of individuals with huntington disease and their relatives enrolled in the cohort study., PLoS ONE 7 429–522.
  • [4] Efron, B. (1981). Censored data and the bootstrap., Journal of the American Statistical Association 76 312–319.
  • [5] Foroud, T., Gray, J., Ivashina, J., and Conneally, P. (1999). Differences in duration of huntingtons disease based on age at onse., Journal of Neurological Neurosurg Psychiatry 66 52–56.
  • [6] Huntington Study Group (1993). A novel gene containing a trinucleotide repeat that is expanded and unstable on huntingtons disease chromosomes., Cell 72 971–983.
  • [7] Kaplan, E. L. and Meier, P. (1958). Nonparametric estimation from incomplete observations., Journal of the American Statistical Association 53 457–481.
  • [8] Langbehn, D.R. and Brinkman, R.R. and Falush, D. and Paulsen, J.S. and Hayden, M.R. and International Huntington’s Disease Collaborative Group (2004). A new model for prediction of the age of onset and penetrance for Huntington’s disease based on CAG length., Clinical Genetics 65 267–77.
  • [9] Lindsay, B. G. and Qu, A. (2003). Inference functions and quadratic score tests., Statistical Science 18 394–410.
  • [10] Ma, Y. and Wang, Y. (2012). Efficient semiparametric estimation for mixture data., Electronic Journal of Statistics 6 710–737.
  • [11] Ma, Y. and Wang, Y. (2014). Estimating disease onset distribution functions in mutation carriers with censored mixture data., Journal of the Royal Statistical Society, Series C 63 1–23.
  • [12] Wacholder, S., Hartge, P., Struewing, J. P., Pee, D., McAdams, M., Brody, L., and Tucker, M. (1998). The kin-cohort study for estimating penetrance., American Journal of Epidemiology 148 623–630.
  • [13] Wang, Y., Clark, L. N., Louis, E. D., Mejia-Santana, H., Harris, J., Cote, L. J., Waters, C., Andrews, D., Ford, B., Frucht, S., Fahn, S., Ottman, R., Rabinowitz, D., and Marder, K. (2008). Risk of Parkinson disease in carriers of parkin mutations: estimation using the kin-cohort method., Archvies of Neurology 65 467–474.