Open Access
2017 A provable smoothing approach for high dimensional generalized regression with applications in genomics
Fang Han, Hongkai Ji, Zhicheng Ji, Honglang Wang
Electron. J. Statist. 11(2): 4347-4403 (2017). DOI: 10.1214/17-EJS1352

Abstract

In many applications, linear models fit the data poorly. This article studies an appealing alternative, the generalized regression model. This model only assumes that there exists an unknown monotonically increasing link function connecting the response $Y$ to a single index $\boldsymbol{X} ^{\mathsf{T}}\boldsymbol{\beta } ^{*}$ of explanatory variables $\boldsymbol{X} \in{\mathbb{R}} ^{d}$. The generalized regression model is flexible and covers many widely used statistical models. It fits the data generating mechanisms well in many real problems, which makes it useful in a variety of applications where regression models are regularly employed. In low dimensions, rank-based M-estimators are recommended to deal with the generalized regression model, giving root-$n$ consistent estimators of $\boldsymbol{\beta } ^{*}$. Applications of these estimators to high dimensional data, however, are questionable. This article studies, both theoretically and practically, a simple yet powerful smoothing approach to handle the high dimensional generalized regression model. Theoretically, a family of smoothing functions is provided, and the amount of smoothing necessary for efficient inference is carefully calculated. Practically, our study is motivated by an important and challenging scientific problem: decoding gene regulation by predicting transcription factors that bind to cis-regulatory elements. Applying our proposed method to this problem shows substantial improvement over the state-of-the-art alternative in real data.

Citation

Download Citation

Fang Han. Hongkai Ji. Zhicheng Ji. Honglang Wang. "A provable smoothing approach for high dimensional generalized regression with applications in genomics." Electron. J. Statist. 11 (2) 4347 - 4403, 2017. https://doi.org/10.1214/17-EJS1352

Information

Received: 1 December 2016; Published: 2017
First available in Project Euclid: 16 November 2017

zbMATH: 06816619
MathSciNet: MR3724223
Digital Object Identifier: 10.1214/17-EJS1352

Subjects:
Primary: 47N30

Keywords: generalized regression model , rank-based M-estimator , semiparametric regression , smoothing approximation , transcription factor binding

Vol.11 • No. 2 • 2017
Back to Top