Concentration inequalities for two-sample rank processes with application to bipartite ranking

Stephan Clémençon; Myrto Limnios; Nicolas Vayatis

doi:10.1214/21-EJS1907

2021 Concentration inequalities for two-sample rank processes with application to bipartite ranking

Stephan Clémençon, Myrto Limnios, Nicolas Vayatis

Author Affiliations +

Electron. J. Statist. 15(2): 4659-4717 (2021). DOI: 10.1214/21-EJS1907

Abstract

The $\mathrm{ROC}$ curve is the gold standard for measuring the performance of a test/scoring statistic regarding its capacity to discriminate between two statistical populations in a wide variety of applications, ranging from anomaly detection in signal processing to information retrieval, through medical diagnosis. Most practical performance measures used in scoring/ranking applications such as the $\mathrm{AUC}$ , the local $\mathrm{AUC}$ , the p-norm push, the DCG and others, can be viewed as summaries of the $\mathrm{ROC}$ curve. In this paper, the fact that most of these empirical criteria can be expressed as two-sample linear rank statistics is highlighted and concentration inequalities for collections of such random variables, referred to as two-sample rank processes here, are proved, when indexed by VC classes of scoring functions. Based on these nonasymptotic bounds, the generalization capacity of empirical maximizers of a wide class of ranking performance criteria is next investigated from a theoretical perspective. It is also supported by empirical evidence through convincing numerical experiments.

Funding Statement

This work was supported with grants from Région Ile-de-France, the industrial chairs IDAML of Ecole Normale Supérieure Paris-Saclay and DSAIDIS of Télécom Paris, and ANR (project Limpid).

Acknowledgments

We thank the reviewers for their helpful comments.

Citation

Download Citation

Stephan Clémençon. Myrto Limnios. Nicolas Vayatis. "Concentration inequalities for two-sample rank processes with application to bipartite ranking." Electron. J. Statist. 15 (2) 4659 - 4717, 2021. https://doi.org/10.1214/21-EJS1907

Information

Received: 1 April 2021; Published: 2021

First available in Project Euclid: 27 September 2021

Digital Object Identifier: 10.1214/21-EJS1907

Subjects:

Primary: 62G99 , 68Q32

Secondary: 60E15 , 62C12

Keywords: Bipartite ranking , Concentration inequalities , empirical risk minimization , generalization bounds , rank process , statistical learning theory , two-sample linear rank statistics

Rights: Creative Commons Attribution 4.0 International License.

Access the abstract

JOURNAL ARTICLE
59 PAGES

DOWNLOAD PDF + SAVE TO MY LIBRARY