Abstract
The curve is the gold standard for measuring the performance of a test/scoring statistic regarding its capacity to discriminate between two statistical populations in a wide variety of applications, ranging from anomaly detection in signal processing to information retrieval, through medical diagnosis. Most practical performance measures used in scoring/ranking applications such as the , the local , the p-norm push, the DCG and others, can be viewed as summaries of the curve. In this paper, the fact that most of these empirical criteria can be expressed as two-sample linear rank statistics is highlighted and concentration inequalities for collections of such random variables, referred to as two-sample rank processes here, are proved, when indexed by VC classes of scoring functions. Based on these nonasymptotic bounds, the generalization capacity of empirical maximizers of a wide class of ranking performance criteria is next investigated from a theoretical perspective. It is also supported by empirical evidence through convincing numerical experiments.
Funding Statement
This work was supported with grants from Région Ile-de-France, the industrial chairs IDAML of Ecole Normale Supérieure Paris-Saclay and DSAIDIS of Télécom Paris, and ANR (project Limpid).
Acknowledgments
We thank the reviewers for their helpful comments.
Citation
Stephan Clémençon. Myrto Limnios. Nicolas Vayatis. "Concentration inequalities for two-sample rank processes with application to bipartite ranking." Electron. J. Statist. 15 (2) 4659 - 4717, 2021. https://doi.org/10.1214/21-EJS1907
Information