The Annals of Applied Statistics

Bayesian semiparametric inference for multivariate doubly-interval-censored data

Alejandro Jara, Emmanuel Lesaffre, Maria De Iorio, and Fernando Quintana

Full-text: Open access


Based on a data set obtained in a dental longitudinal study, conducted in Flanders (Belgium), the joint time to caries distribution of permanent first molars was modeled as a function of covariates. This involves an analysis of multivariate continuous doubly-interval-censored data since: (i) the emergence time of a tooth and the time it experiences caries were recorded yearly, and (ii) events on teeth of the same child are dependent. To model the joint distribution of the emergence times and the times to caries, we propose a dependent Bayesian semiparametric model. A major feature of the proposed approach is that survival curves can be estimated without imposing assumptions such as proportional hazards, additive hazards, proportional odds or accelerated failure time.

Article information

Ann. Appl. Stat., Volume 4, Number 4 (2010), 2126-2149.

First available in Project Euclid: 4 January 2011

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Multivariate doubly-interval-censored data Bayesian nonparametrics linear dependent Poisson–Dirichlet prior linear dependent Dirichlet process prior


Jara, Alejandro; Lesaffre, Emmanuel; De Iorio, Maria; Quintana, Fernando. Bayesian semiparametric inference for multivariate doubly-interval-censored data. Ann. Appl. Stat. 4 (2010), no. 4, 2126--2149. doi:10.1214/10-AOAS368.

Export citation


  • Bacchetti, P. and Jewell, N. P. (1991). Nonparametric estimation of the incubation period of AIDS based on a prevalent cohort with unknown infection times. Biometrics 47 947–960.
  • Carlton, M. A. (1999). Applications of the two-parameter Poisson–Dirichlet distribution Unpublished Doctoral thesis, Univ. California, Los Angeles.
  • Caron, F., Davy, M., Doucet, A., Duflos, E. and Vanheeghe, P. (2008). Bayesian inference for linear dynamic models with Dirichlet process mixtures. IEEE Transactions on Signal Processing 56 71–84.
  • Dahl, D. (2005). Sequentially-allocated merge-split sampler for conjugate and nonconjugate Dirichlet process mixture models. Technical report, Dept. Statistics, Texas A&M University.
  • De Gruttola, V. and Lagakos, S. W. (1989). Analysis of doubly-censored survival data, with application to AIDS. Biometrics 45 1–11.
  • De Iorio, M., Müller, P., Rosner, G. L. and MacEachern, S. N. (2004). An ANOVA model for dependent random measures. J. Amer. Statist. Assoc. 99 205–215.
  • De Iorio, M., Johnson, W. O., Mueller, P. and Rosner, L. G. (2009). Bayesian nonparametric nonproportional hazards survival modelling. Biometrics 65 762–771.
  • De la Cruz, R., Quintana, F. A. and Müller, P. (2007). Semiparametric Bayesian classification with longitudinal markers. Appl. Statist. 56 119–137.
  • De Vos, E. and Vanobbergen, J. (2006). Caries prevalence in Belgian children: A review. Arch. Public Health 64 217–229.
  • Duan, J. A., Guindani, M. and Gelfand, A. E. (2007). Generalized spatial Dirichlet process models. Biometrika 94 809–825.
  • Dunson, D. B. and Herring, A. H. (2006). Semiparametric Bayesian latent trajectory models. Technical report, ISDS Discussion Paper 16, Duke Univ., Durham, NC, USA.
  • Dunson, B. D. and Park, J. H. (2008). Kernel stick-breaking processes. Biometrika 95 307–323.
  • Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist. 1 209–230.
  • Gelfand, A. E., Kottas, A. and MacEachern, S. N. (2005). Bayesian nonparametric spatial modeling with Dirichlet process mixing. J. Amer. Statist. Assoc. 100 1021–1035.
  • Goggins, W. B., Finkelstein, D. M. and Zaslavsky, A. M. (1999). Applying the Cox proportional hazards model for analysis of latency data with interval censoring. Stat. Med. 18 2737–2747.
  • Gómez, G. and Calle, M. L. (1999). Non-parametric estimation with doubly censored data. J. Appl. Statist. 26 45–58.
  • Gómez, G. and Lagakos, S. W. (1994). Estimation of the infection time and latency distribution of AIDS with doubly censored data. Biometrics 50 204–212.
  • Griffin, J. E. and Steel, M. F. J. (2006). Order-based dependent Dirichlet processes. J. Amer. Statist. Assoc. 101 179–194.
  • Ishwaran, H. and James, L. F. (2001). Gibbs sampling methods for stick-breaking priors. J. Amer. Statist. Assoc. 96 161–173.
  • Ishwaran, H. and James, L. F. (2003). Generalized weighted Chinese restaurant processes for species sampling mixture models. Statist. Sinica 13 1211–1235.
  • Jara, A. (2007). Applied Bayesian non- and semi-parametric inference using DPpackage. Rnews 7 17–26.
  • Jara, A., Lesaffre, E., De Iorio, M. and Quintana, F. A. (2010a). Supplement A to “Bayesian semiparametric inference for multivariate doubly-interval-censored data.” DOI: 10.1214/10-AOAS368SUPPA.
  • Jara, A., Lesaffre, E., De Iorio, M. and Quintana, F. A. (2010b). Supplement B to “Bayesian semiparametric inference for multivariate doubly-interval-censored data.” DOI: 10.1214/10-AOAS368SUPPB.
  • Jeffreys, H. (1961). The Theory of Probability, 3rd. ed. Oxford University Press, Oxford, UK.
  • Kim, M. Y., De Gruttola, V. G. and Lagakos, S. W. (1993). Analyzing doubly censored data with covariates, with application to AIDS. Biometrics 49 13–22.
  • Komárek, A. and Lesaffre, E. (2008). Bayesian accelerated failure time model with multivariate doubly-interval-censored data and flexible distributional assumptions. J. Amer. Statist. Assoc. 103 523–533.
  • Komárek, A., Lesaffre, E., Härkänen, T., Declerck, D. and Virtanen, J. I. (2005). A Bayesian analysis of multivariate doubly-interval-censored dental data. Biostatistics 6 145–155.
  • Korwar, R. M. and Hollander, M. (1973). Contributions to the theory of Dirichlet processes. Ann. Probab. 1 705–711.
  • Lang, S. and Brezger, A. (2004). Bayesian P-splines. J. Comput. Graph. Statist. 13 183–212.
  • Leroy, R., Bogaerts, K., Lesaffre, E. and Declerck, D. (2005). Effect of caries experience in primary molars on cavity formation in the adjacent permanent first molar. Caries Res. 39 342–349.
  • Lijoi, A., Mena, R. H. and Prünster, I. (2007a). A Bayesian nonparametric method for prediction in EST analysis. BMC Bioinformatics 8 339–360.
  • Lijoi, A., Mena, R. H. and Prünster, I. (2007b). Bayesian nonparametric estimation of the probability of discovering new species. Biometrika 94 769–786.
  • Lijoi, A., Mena, R. H. and Prünster, I. (2008). A Bayesian nonparametric approach for comparing clustering structures in EST libraries. J. Comput. Biol. 15 1315–1327.
  • MacEachern, S. N. (1999). Dependent nonparametric processes. In ASA Proceedings of the Section on Bayesian Statistical Science, Alexandria, VA 50–55. Amer. Statist. Assoc., Alexandria, VA.
  • MacEachern, S. N. (2000). Dependent Dirichlet processes. Technical report, Dept. Statistics, Ohio State Univ.
  • Marthaler, T. M., O’Mullane, D. M. and Vrbic, V. (1996). The prevalence of dental caries in Europe 1990–1995. Caries Res. 30 237–255.
  • Müller, P., Quintana, F. A. and Rosner, G. (2004). A method for combining inference across related nonparametric Bayesian models. J. Roy. Statist. Soc. Ser. B 66 735–749.
  • Navarrete, C., Quintana, F. A. and Müller, P. (2008). Some issues on nonparametric Bayesian modeling using species sampling models. Statist. Modell. 8 3–21.
  • Pan, W. (2001). A multiple imputation approach to regression analysis for doubly censored data with application to AIDS studies. Biometrics 57 1245–1250.
  • Petersson, G. H. and Bratthall, D. (1996). The caries decline: A review of reviews. Eur. J. Oral Sci. 104 436–443.
  • Pitman, J. (1996). Some developments of the Blackwell–MacQueen urn scheme. In Statistics, Probability and Game Theory. Papers in Honor of David Blackwell ( T. S. Ferguson, L. S. Shapeley and J. B. MacQueen, eds.). IMS Lecture Notes—Monograph Series 245–268. Hayward, CA.
  • Pitman, J. and Yor, M. (1997). The two-parameter Poisson–Dirichlet distribution derived from a stable subordinator. Ann. Probab. 25 855–900.
  • Sethuraman, J. (1994). A constructive definition of Dirichlet process prior. Statist. Sinica 2 639–650.
  • Sun, J. (1995). Empirical estimation of a distribution function with truncated and doubly interval-censored data and its application to AIDS studies. Biometrics 51 1096–1104.
  • Sun, J., Liao, Q. and Pagano, M. (1995). Regression analysis of doubly censored failure time data with application to AIDS studies. Biometrics 55 909–914.
  • Sun, J., Lim, H.-J. and Zhao, X. (2004). An independence test for doubly censored failure time data. Biom. J. 46 503–511.
  • Vanobbergen, J., Martens, L., Lesaffre, E. and Declerck, D. (2000). The Signal Tandmobiel project, a longitudinal intervention health promotion study in Flanders (Belgium): Baseline and first year results. Eur. J. Paediat. Dent. 1 87–96.
  • Willems, S., Vanobbergen, J., Martens, L. and De Maeseneer, J. (2005). The independent impact of household and neighborhood-based social determinants on early childhood caries. Family and Community Health 28 168–175.

Supplemental materials

  • Supplementary material A: MCMC schemes for posterior computation. A complete description of the full conditionals for marginal and conditional MCMC algorithms for fitting the LDPD survival model for doubly-interval-censored data is given.
  • Supplementary material B: The HIV-AIDS data. The analysis of the data set considered by De Gruttola and Lagakos (1989) is presented. This analysis allows for the comparison of the LDPD model with the one-sample nonparametric maximum likelihood estimator proposed by De Gruttola and Lagakos (1989). The data set considers information from a cohort of hemophiliacs at risk of human immunodeficiency virus (HIV) infection from infusions of blood they received periodically to treat their hemophilia in two hospitals in France. For this cohort both infection with HIV and the onset of acquired immunodeficiency syndrome (AIDS) or other clinical symptoms could be subject to censoring. Therefore, the induction time between infection and clinical AIDS are treated as doubly-censored.