## The Annals of Statistics

### The asymptotics of ranking algorithms

#### Abstract

We consider the predictive problem of supervised ranking, where the task is to rank sets of candidate items returned in response to queries. Although there exist statistical procedures that come with guarantees of consistency in this setting, these procedures require that individuals provide a complete ranking of all items, which is rarely feasible in practice. Instead, individuals routinely provide partial preference information, such as pairwise comparisons of items, and more practical approaches to ranking have aimed at modeling this partial preference data directly. As we show, however, such an approach raises serious theoretical challenges. Indeed, we demonstrate that many commonly used surrogate losses for pairwise comparison data do not yield consistency; surprisingly, we show inconsistency even in low-noise settings. With these negative results as motivation, we present a new approach to supervised ranking based on aggregation of partial preferences, and we develop $U$-statistic-based empirical risk minimization procedures. We present an asymptotic analysis of these new procedures, showing that they yield consistency results that parallel those available for classification. We complement our theoretical results with an experiment studying the new procedures in a large-scale web-ranking task.

#### Article information

Source
Ann. Statist., Volume 41, Number 5 (2013), 2292-2323.

Dates
First available in Project Euclid: 5 November 2013

https://projecteuclid.org/euclid.aos/1383661265

Digital Object Identifier
doi:10.1214/13-AOS1142

Mathematical Reviews number (MathSciNet)
MR3127867

Zentralblatt MATH identifier
1281.62058

#### Citation

Duchi, John C.; Mackey, Lester; Jordan, Michael I. The asymptotics of ranking algorithms. Ann. Statist. 41 (2013), no. 5, 2292--2323. doi:10.1214/13-AOS1142. https://projecteuclid.org/euclid.aos/1383661265

#### References

• [1] Ammar, A. and Shah, D. (2011). Ranking: Compare, don’t score. In The 49th Allerton Conference on Communication, Control, and Computing. IEEE, Washington, DC.
• [2] Arrow, K. J. (1951). Social Choice and Individual Values. Cowles Commission Monograph 12. Wiley, New York, NY.
• [3] Bartlett, P. L., Jordan, M. I. and McAuliffe, J. D. (2006). Convexity, classification, and risk bounds. J. Amer. Statist. Assoc. 101 138–156.
• [4] Ben-Tal, A. and Nemirovski, A. (2001). Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications. SIAM, Philadelphia, PA.
• [5] Bradley, R. A. and Terry, M. E. (1952). Rank analysis of incomplete block designs. I. The method of paired comparisons. Biometrika 39 324–345.
• [6] Buffoni, D., Calauzenes, C., Gallinari, P. and Usunier, N. (2011). Learning scoring functions with order-preserving losses and standardized supervision. In Proceedings of the 28th International Conference on Machine Learning 825–832. Omnipress, Madison, WI.
• [7] Cesa-Bianchi, N., Conconi, A. and Gentile, C. (2002). On the generalization ability of on-line learning algorithms. In Advances in Neural Information Processing Systems 14 359–366. MIT Press, Cambridge, MA.
• [8] Chapelle, O., Metzler, D., Zhang, Y. and Grinspan, P. (2009). Expected reciprocal rank for graded relevance. In Conference on Information and Knowledge Management. ACM, New York.
• [9] Chung, F. R. K. (1997). Spectral Graph Theory. CBMS Regional Conference Series in Mathematics 92. Conference Board of the Mathematical Sciences, Washington, DC.
• [10] Clémençon, S., Lugosi, G. and Vayatis, N. (2008). Ranking and empirical minimization of $U$-statistics. Ann. Statist. 36 844–874.
• [11] Condorcet, N. (1785). Essai sur l’Application de l’Analyse à la Probabilité des Décisions Rendues à la Pluralité des Voix. Paris.
• [12] Cossock, D. and Zhang, T. (2008). Statistical analysis of Bayes optimal subset ranking. IEEE Trans. Inform. Theory 54 5140–5154.
• [13] Craswell, N., Zoeter, O., Taylor, M. J. and Ramsey, B. (2008). An experimental comparison of click position-bias models. In Web Search and Data Mining (WSDM) 87–94. ACM, New York.
• [14] David, H. A. (1969). The Method of Paired Comparisons. Charles Griffin & Company, London.
• [15] de Borda, J. C. (1781). Memoire sur les Elections au Scrutin. Histoire de l’Academie Royale des Sciences, Paris.
• [16] Dekel, O., Manning, C. and Singer, Y. (2004). Log-linear models for label ranking. In Advances in Neural Information Processing Systems 16.
• [17] Duchi, J. and Singer, Y. (2009). Efficient online and batch learning using forward backward splitting. J. Mach. Learn. Res. 10 2899–2934.
• [18] Duchi, J. C., Mackey, L. and Jordan, M. I. (2013). Supplement to “The asymptotics of ranking algorithms.” DOI:10.1214/13-AOS1142SUPP.
• [19] Duchi, J. C., Mackey, L. and Jordan, M. I. (2010). On the consistency of ranking algorithms. In Proceedings of the 27th International Conference on Machine Learning (ICML-10) (J. Fürnkranz and T. Joachims, eds.) 327–334. Omnipress, Madison, WI.
• [20] Duchi, J. C., Shalev-Shwartz, S., Singer, Y. and Tewari, A. (2010). Composite objective mirror descent. In Proceedings of the Twenty Third Annual Conference on Computational Learning Theory.
• [21] Dwork, C., Kumar, R., Naor, M. and Sivakumar, D. (2001). Rank aggregation methods for the web. In Proceedings of the Tenth International Conference on World Wide Web (WWW10) 613–622. ACM, New York.
• [22] Freund, Y., Iyer, R., Schapire, R. E. and Singer, Y. (2003). An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4 933–969.
• [23] Gulliksen, H. (1956). A least squares method for paired comparisons with incomplete data. Psychometrika 21 125–134.
• [24] Herbrich, R., Graepel, T. and Obermayer, K. (2000). Large margin rank boundaries for ordinal regression. In Advances in Large Margin Classifiers. MIT Press, Cambridge, MA.
• [25] Hiriart-Urruty, J. and Lemaréchal, C. (1996). Convex Analysis and Minimization Algorithms I & II. Springer, New York.
• [26] Hopcroft, J. E. and Ullman, J. D. (1979). Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, Reading, MA.
• [27] Järvelin, K. and Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20 422–446.
• [28] Joachims, T. (2002). Optimizing search engines using clickthrough data. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining. ACM, New York.
• [29] Mallows, C. L. (1957). Non-null ranking models. I. Biometrika 44 114–130.
• [30] Manning, C., Raghavan, P. and Schütze, H. (2008). Introduction to Information Retrieval. Cambridge Univ. Press, Cambridge.
• [31] Miller, G. (1956). The magic number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review 63 81–97.
• [32] Mosteller, F. (1951). Remarks on the method of paired comparisons: I. The least squares solution assuming equal standard deviations and equal correlations. Psychometrika 16 3–9.
• [33] Nemirovski, A., Juditsky, A., Lan, G. and Shapiro, A. (2009). Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19 1574–1609.
• [34] Qin, T., Liu, T. Y., Ding, W., Xu, J. and Li, H. (2012). Microsoft learning to rank datasets. Available at http://research.microsoft.com/en-us/projects/mslr/.
• [35] Ravikumar, P., Tewari, A. and Yang, E. (2011). On NDCG consistency of listwise ranking methods. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings 15 618–626. Society for Artificial Intelligence and Statistics.
• [36] Saaty, T. L. (2003). Decision-making with the AHP: Why is the principal eigenvector necessary. European J. Oper. Res. 145 85–91.
• [37] Saaty, T. L. (2008). Relative measurement and its generalization in decision making. Why pairwise comparisons are central in mathematics for the measurement of intangible factors. The analytic hierarchy/network process. Rev. R. Acad. Cienc. Exactas FíS. Nat. Ser. A Math. RACSAM 102 251–318.
• [38] Shashua, A. and Levin, A. (2002). Ranking with large margin principle: Two approaches. In Advances in Neural Information Processing Systems 15.
• [39] Shiffrin, R. M. and Nosofsky, R. M. (1994). Seven plus or minus two: A commentary on capacity limitations. Psychological Review 101 357–361.
• [40] Steinwart, I. (2007). How to compare different loss functions and their risks. Constr. Approx. 26 225–287.
• [41] Stewart, N., Brown, G. and Chater, N. (2005). Absolute identification by relative judgment. Psychological Review 112 881–911.
• [42] Thurstone, L. L. (1927). A law of comparative judgment. Psychological Review 34 273–286.
• [43] Tsukida, K. and Gupta, M. R. (2011). How to analyze paired comparison data. Technical Report UWEETR-2011-0004, Univ. Washington, Dept. Electrical Engineering.
• [44] Zhang, T. (2004). Statistical analysis of some multi-category large margin classification methods. J. Mach. Learn. Res. 5 1225–1251.
• [45] Zhang, T. (2004). Statistical behavior and consistency of classification methods based on convex risk minimization. Ann. Statist. 32 56–85.