## The Annals of Applied Statistics

### Shape-constrained uncertainty quantification in unfolding steeply falling elementary particle spectra

#### Abstract

The high energy physics unfolding problem is an important statistical inverse problem in data analysis at the Large Hadron Collider (LHC) at CERN. The goal of unfolding is to make nonparametric inferences about a particle spectrum from measurements smeared by the finite resolution of the particle detectors. Previous unfolding methods use ad hoc discretization and regularization, resulting in confidence intervals that can have significantly lower coverage than their nominal level. Instead of regularizing using a roughness penalty or stopping iterative methods early, we impose physically motivated shape constraints: positivity, monotonicity, and convexity. We quantify the uncertainty by constructing a nonparametric confidence set for the true spectrum, consisting of all those spectra that satisfy the shape constraints and that predict the observations within an appropriately calibrated level of fit. Projecting that set produces simultaneous confidence intervals for all functionals of the spectrum, including averages within bins. The confidence intervals have guaranteed conservative frequentist finite-sample coverage in the important and challenging class of unfolding problems for steeply falling particle spectra. We demonstrate the method using simulations that mimic unfolding the inclusive jet transverse momentum spectrum at the LHC. The shape-constrained intervals provide usefully tight conservative inferences, while the conventional methods suffer from severe undercoverage.

#### Article information

Source
Ann. Appl. Stat., Volume 11, Number 3 (2017), 1671-1710.

Dates
Revised: April 2017
First available in Project Euclid: 5 October 2017

https://projecteuclid.org/euclid.aoas/1507168844

Digital Object Identifier
doi:10.1214/17-AOAS1053

Mathematical Reviews number (MathSciNet)
MR3709574

Zentralblatt MATH identifier
1380.62274

#### Citation

Kuusela, Mikael; Stark, Philip B. Shape-constrained uncertainty quantification in unfolding steeply falling elementary particle spectra. Ann. Appl. Stat. 11 (2017), no. 3, 1671--1710. doi:10.1214/17-AOAS1053. https://projecteuclid.org/euclid.aoas/1507168844

#### References

• Adye, T. (2011). Unfolding algorithms and tests using RooUnfold. In Proceedings of the PHYSTAT 2011 Workshop on Statistical Issues Related to Discovery Claims in Search Experiments and Unfolding (H. B. Prosper and L. Lyons, eds.). CERN-2011-006 313–318.
• Antoniadis, A. and Bigot, J. (2006). Poisson inverse problems. Ann. Statist. 34 2132–2158.
• ATLAS Collaboration (2012). Measurement of the transverse momentum distribution of ${W}$ bosons in $pp$ collisions at $\sqrt{s}=7$ TeV with the ATLAS detector. Phys. Rev. D 85 012005.
• Backus, G. (1970). Inference from inadequate and inaccurate data, I. Proc. Natl. Acad. Sci. USA 65 1–7.
• Banerjee, M. and Wellner, J. A. (2001). Likelihood ratio tests for monotone functions. Ann. Statist. 29 1699–1731.
• Barney, D. (2004). CMS-doc-4172. Available at https://cms-docdb.cern.ch/cgi-bin/PublicDocDB/ShowDocument?docid=4172. Retrieved 21.1.2014.
• Blobel, V. (2013). Unfolding. In Data Analysis in High Energy Physics: A Practical Guide to Statistical Methods (O. Behnke, K. Kröninger, G. Schott and T. Schörner-Sadenius, eds.) 187–225. Wiley, Weinheim.
• Burrus, W. R. (1965). Utilization of a priori information by means of mathematical programming in the statistical interpretation of measured distributions. ORNL-3743, Oak Ridge National Laboratory.
• Burrus, W. R. and Verbinski, V. V. (1969). Fast-neutron spectroscopy with thick organic scintillators. Nucl. Instrum. Methods 67 181–196.
• Cai, T. T., Low, M. G. and Xia, Y. (2013). Adaptive confidence intervals for regression functions under shape constraints. Ann. Statist. 41 722–750.
• Carroll, R. J., Delaigle, A. and Hall, P. (2011). Testing and estimating shape-constrained nonparametric density and regression in the presence of measurement error. J. Amer. Statist. Assoc. 106 191–202.
• Choudalakis, G. (2012). Fully Bayesian unfolding. Preprint. Available at arXiv:1201.4612v4 [physics.data-an].
• CMS Collaboration (2008). The CMS experiment at the CERN LHC. J. Instrum. 3 S08004.
• CMS Collaboration (2010). Measurement of the inclusive jet cross section in $pp$ collisions at 7 TeV. CMS-PAS-QCD-10-011. Available at http://cds.cern.ch/record/1280682.
• CMS Collaboration (2011). Measurement of the inclusive jet cross section in $pp$ collisions at $\sqrt{s}=7~\mathrm{TeV}$. Phys. Rev. Lett. 107 132001.
• CMS Collaboration (2013a). Measurements of differential jet cross sections in proton-proton collisions at $\sqrt{s}=7~\mathrm{TeV}$ with the CMS detector. Phys. Rev. D 87 112002.
• CMS Collaboration (2013b). Measurement of differential top-quark-pair production cross sections in $pp$ collisions at $\sqrt{s}=7~\mathrm{TeV}$. Eur. Phys. J. C 73 2339.
• CMS Collaboration (2016). Measurement of differential cross sections for Higgs boson production in the diphoton decay channel in $pp$ collisions at $\sqrt{s}=8~\mathrm{TeV}$. Eur. Phys. J. C 76 13.
• Cowan, G. (1998). Statistical Data Analysis. Oxford Univ. Press, London.
• D’Agostini, G. (1995). A multidimensional unfolding method based on Bayes’ theorem. Nucl. Instrum. Methods Phys. Res., Sect. A 362 487–498.
• Davies, P. L., Kovac, A. and Meise, M. (2009). Nonparametric regression, confidence regions and regularization. Ann. Statist. 37 2597–2625.
• Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39 1–38.
• Dümbgen, L. (1998). New goodness-of-fit tests and their application to nonparametric confidence sets. Ann. Statist. 26 288–314.
• Dümbgen, L. (2003). Optimal confidence bands for shape-restricted curves. Bernoulli 9 423–449.
• Forte, S. and Watt, G. (2013). Progress in the determination of the partonic structure of the proton. Annu. Rev. Nucl. Part. Sci. 63 291–328.
• Garwood, F. (1936). Fiducial limits for the Poisson distribution. Biometrika 28 437–442.
• Genovese, C. and Wasserman, L. (2008). Adaptive confidence bands. Ann. Statist. 36 875–905.
• Green, P. J. and Silverman, B. W. (1994). Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach. Monographs on Statistics and Applied Probability 58. Chapman & Hall, London.
• Grenander, U. (1956). On the theory of mortality measurement. II. Skand. Aktuarietidskr. 39 125–153.
• Groeneboom, P. and Jongbloed, G. (2014). Nonparametric Estimation Under Shape Constraints: Estimators, Algorithms and Asymptotics. Cambridge Series in Statistical and Probabilistic Mathematics 38. Cambridge Univ. Press, New York.
• Groeneboom, P. and Jongbloed, G. (2015). Nonparametric confidence intervals for monotone functions. Ann. Statist. 43 2019–2054.
• Groeneboom, P., Jongbloed, G. and Wellner, J. A. (2001). Estimation of a convex function: Characterizations and asymptotic theory. Ann. Statist. 29 1653–1698.
• Hall, P. and Horowitz, J. (2013). A simple bootstrap method for constructing nonparametric confidence bands for functions. Ann. Statist. 41 1892–1921.
• Hansen, P. C. (1998). Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA.
• Hansen, P. C. (2010). Discrete Inverse Problems: Insight and Algorithms. Fundamentals of Algorithms 7. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA.
• Hengartner, N. W. and Stark, P. B. (1992). Conservative finite-sample confidence envelopes for monotone and unimodal densities. Technical Report No. 341, Dept. Statistics, Univ. California, Berkeley.
• Hengartner, N. W. and Stark, P. B. (1995). Finite-sample confidence envelopes for shape-restricted densities. Ann. Statist. 23 525–550.
• Höcker, A. and Kartvelishvili, V. (1996). SVD approach to data unfolding. Nucl. Instrum. Methods Phys. Res., Sect. A 372 469–481.
• Kondor, A. (1983). Method of convergent weights—An iterative procedure for solving Fredholm’s integral equations of the first kind. Nucl. Instrum. Methods 216 177–181.
• Kuusela, M. and Panaretos, V. M. (2015). Statistical unfolding of elementary particle spectra: Empirical Bayes estimation and bias-corrected uncertainty quantification. Ann. Appl. Stat. 9 1671–1705.
• Lange, K. and Carson, R. (1984). EM reconstruction algorithms for emission and transmission tomography. J. Comput. Assist. Tomogr. 8 306–316.
• Low, M. G. (1997). On nonparametric confidence intervals. Ann. Statist. 25 2547–2554.
• Lucy, L. B. (1974). An iterative technique for the rectification of observed distributions. Astron. J. 79 745–754.
• Luenberger, D. G. (1969). Optimization by Vector Space Methods. Wiley, New York.
• Mathworks (2014). Optimization Toolbox User’s Guide. Release 2014a.
• Meister, A. (2009). Deconvolution Problems in Nonparametric Statistics. Lecture Notes in Statistics 193. Springer, Berlin.
• Mülthei, H. N. and Schorr, B. (1987a). On an iterative method for a class of integral equations of the first kind. Math. Methods Appl. Sci. 9 137–168.
• Mülthei, H. N. and Schorr, B. (1987b). On an iterative method for the unfolding of spectra. Nucl. Instrum. Methods Phys. Res., Sect. A 257 371–377.
• Mülthei, H. N. and Schorr, B. (1989). On properties of the iterative maximum likelihood reconstruction method. Math. Methods Appl. Sci. 11 331–342.
• NNPDF Collaboration (2015). Parton distributions for the LHC run II. J. High Energy Phys. 1504 040.
• O’Leary, D. P. and Rust, B. W. (1986). Confidence intervals for inequality-constrained least squares problems, with applications to ill-posed problems. SIAM J. Sci. Statist. Comput. 7 473–489.
• O’Sullivan, F. (1986). A statistical perspective on ill-posed inverse problems. Statist. Sci. 1 502–527.
• Pflug, G. Ch. and Wets, R. J.-B. (2013). Shape-restricted nonparametric regression with overall noisy measurements. J. Nonparametr. Stat. 25 323–338.
• Phillips, D. L. (1962). A technique for the numerical solution of certain integral equations of the first kind. J. ACM 9 84–97.
• Pierce, J. E. and Rust, B. W. (1985). Constrained least squares interval estimation. SIAM J. Sci. Statist. Comput. 6 670–673.
• Prosper, H. B. and Lyons, L., eds. (2011). Proceedings of the PHYSTAT 2011 Workshop on Statistical Issues Related to Discovery Claims in Search Experiments and Unfolding. CERN-2011-006.
• Reiss, R.-D. (1993). A Course on Point Processes. Springer, New York.
• Richardson, W. H. (1972). Bayesian-based iterative method of image restoration. J. Opt. Soc. Amer. 62 55–59.
• Robertson, T., Wright, F. T. and Dykstra, R. L. (1988). Order Restricted Statistical Inference. Wiley, Chichester.
• Ruppert, D., Wand, M. P. and Carroll, R. J. (2003). Semiparametric Regression. Cambridge Series in Statistical and Probabilistic Mathematics 12. Cambridge Univ. Press, Cambridge.
• Rust, B. W. and Burrus, W. R. (1972). Mathematical Programming and the Numerical Solution of Linear Equations. American Elsevier, Publishing Co., Inc., New York.
• Rust, B. W. and O’Leary, D. P. (1994). Confidence intervals for discrete approximations to ill-posed problems. J. Comput. Graph. Statist. 3 67–96.
• Schmitt, S. (2012). TUnfold, an algorithm for correcting migration effects in high energy physics. J. Instrum. 7 T10003.
• Shepp, L. A. and Vardi, Y. (1982). Maximum likelihood reconstruction for emission tomography. IEEE Trans. Med. Imag. 1 113–122.
• Stark, P. B. (1992). Inference in infinite-dimensional inverse problems: Discretization and duality. J. Geophys. Res. 97 14055–14082.
• Tikhonov, A. N. (1963). Solution of incorrectly formulated problems and the regularization method. Sov. Math., Dokl. 4 1035–1038.
• Vardi, Y., Shepp, L. A. and Kaufman, L. (1985). A statistical model for positron emission tomography. J. Amer. Statist. Assoc. 80 8–37.
• Vogel, C. R. (2002). Computational Methods for Inverse Problems. Frontiers in Applied Mathematics 23. SIAM, Philadelphia, PA.
• Volobouev, I. (2015). On the expectation-maximization unfolding with smoothing. Preprint. Available at arXiv:1408.6500v2 [physics.data-an].
• Voutilainen, M. (2012). Personal communication.
• Wahba, G. (1982). Constrained regularization for ill-posed linear operator equations, with applications in meteorology and medicine. In Statistical Decision Theory and Related Topics, III, Vol. 2 (West Lafayette, Ind., 1981) 383–418. Academic Press, New York.