Open Access
2021 Regularizing double machine learning in partially linear endogenous models
Corinne Emmenegger, Peter Bühlmann
Author Affiliations +
Electron. J. Statist. 15(2): 6461-6543 (2021). DOI: 10.1214/21-EJS1931


The linear coefficient in a partially linear model with confounding variables can be estimated using double machine learning (DML). However, this DML estimator has a two-stage least squares (TSLS) interpretation and may produce overly wide confidence intervals. To address this issue, we propose a regularization and selection scheme, regsDML, which leads to narrower confidence intervals. It selects either the TSLS DML estimator or a regularization-only estimator depending on whose estimated variance is smaller. The regularization-only estimator is tailored to have a low mean squared error. The regsDML estimator is fully data driven. The regsDML estimator converges at the parametric rate, is asymptotically Gaussian distributed, and asymptotically equivalent to the TSLS DML estimator, but regsDML exhibits substantially better finite sample properties. The regsDML estimator uses the idea of k-class estimators, and we show how DML and k-class estimation can be combined to estimate the linear coefficient in a partially linear endogenous model. Empirical examples demonstrate our methodological and theoretical developments. Software code for our regsDML method is available in the R-package dmlalg.

Funding Statement

This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 786461).


We thank Matthias Löffler, the editor, associate editor, and anonymous reviewers for constructive comments.


Download Citation

Corinne Emmenegger. Peter Bühlmann. "Regularizing double machine learning in partially linear endogenous models." Electron. J. Statist. 15 (2) 6461 - 6543, 2021.


Received: 1 January 2021; Published: 2021
First available in Project Euclid: 27 December 2021

Digital Object Identifier: 10.1214/21-EJS1931

Keywords: Double machine learning , endogenous variables , generalized method of moments , instrumental variables , k-class estimation , partially linear model , regularization , Semiparametric estimation , two-stage least squares

Vol.15 • No. 2 • 2021
Back to Top