The Annals of Applied Statistics

Semiparametric regression in testicular germ cell data

Anastasia Voulgaraki, Benjamin Kedem, and Barry I. Graubard

Full-text: Open access


It is possible to approach regression analysis with random covariates from a semiparametric perspective where information is combined from multiple multivariate sources. The approach assumes a semiparametric density ratio model where multivariate distributions are “regressed” on a reference distribution. A kernel density estimator can be constructed from many data sources in conjunction with the semiparametric model. The estimator is shown to be more efficient than the traditional single-sample kernel density estimator, and its optimal bandwidth is discussed in some detail. Each multivariate distribution and the corresponding conditional expectation (regression) of interest are estimated from the combined data using all sources. Graphical and quantitative diagnostic tools are suggested to assess model validity. The method is applied in quantifying the effect of height and age on weight of germ cell testicular cancer patients. Comparisons are made with multiple regression, generalized additive models (GAM) and nonparametric kernel regression.

Ann. Appl. Stat., Volume 6, Number 3 (2012), 1185-1208.

First available in Project Euclid: 31 August 2012

Multivariate density ratio model kernel random covariates diagnostic Nadaraya–Watson GAM


Voulgaraki, Anastasia; Kedem, Benjamin; Graubard, Barry I. Semiparametric regression in testicular germ cell data. Ann. Appl. Stat. 6 (2012), no. 3, 1185--1208. doi:10.1214/12-AOAS552.

Supplemental materials

  • Supplementary material: Supplement to “Semiparametric regression in testicular germ cell data”. The supplementary material contains all the mathematical proofs of the lemmas, corrolaries and theorems supporting the statements and results, including some additional references.