Open Access
August 2020 Matching Methods for Observational Studies Derived from Large Administrative Databases
Ruoqi Yu, Jeffrey H. Silber, Paul R. Rosenbaum
Statist. Sci. 35(3): 338-355 (August 2020). DOI: 10.1214/19-STS699


We propose new optimal matching techniques for large administrative data sets. In current practice, very large matched samples are constructed by subdividing the population and solving a series of smaller problems, for instance, matching men to men and separately matching women to women. Without simplification of some kind, the time required to optimally match $T$ treated individuals to $T$ controls selected from $C\geq T$ potential controls grows much faster than linearly with the number of people to be matched—the required time is of order $O\{(T+C)^{3}\}$—so splitting one large problem into many small problems greatly accelerates the computations. This common practice has several disadvantages that we describe. In its place, we propose a single match, using everyone, that accelerates the computations in a different way. In particular, we use an iterative form of Glover’s algorithm for a doubly convex bipartite graph to determine an optimal caliper for the propensity score, radically reducing the number of candidate matches; then we optimally match in a large but much sparser graph. In this graph, a modified form of near-fine balance can be used on a much larger scale, improving its effectiveness. We illustrate the method using data from US Medicaid, matching children receiving surgery at a children’s hospital to similar children receiving surgery at a hospital that mostly treats adults. In the example, we form 38,841 matched pairs from 159,527 potential controls, controlling for 29 covariates plus 463 Principal Surgical Procedures, plus 973 Principal Diagnoses. The method is implemented in an $\mathtt{R}$ package $\mathtt{bigmatch}$ available from $\mathtt{CRAN}$.


Download Citation

Ruoqi Yu. Jeffrey H. Silber. Paul R. Rosenbaum. "Matching Methods for Observational Studies Derived from Large Administrative Databases." Statist. Sci. 35 (3) 338 - 355, August 2020.


Published: August 2020
First available in Project Euclid: 11 September 2020

MathSciNet: MR4148206
Digital Object Identifier: 10.1214/19-STS699

Keywords: Causal inference , fine balance , Glover’s algorithm , observational study , optimal caliper , optimal matching , propensity score

Rights: Copyright © 2020 Institute of Mathematical Statistics

Vol.35 • No. 3 • August 2020
Back to Top