Open Access
2019 Surrogate losses in passive and active learning
Steve Hanneke, Liu Yang
Electron. J. Statist. 13(2): 4646-4708 (2019). DOI: 10.1214/19-EJS1635

Abstract

Active learning is a type of sequential design for supervised machine learning, in which the learning algorithm sequentially requests the labels of selected instances from a large pool of unlabeled data points. The objective is to produce a classifier of relatively low risk, as measured under the $0$-$1$ loss, ideally using fewer label requests than the number of random labeled data points sufficient to achieve the same. This work investigates the potential uses of surrogate loss functions in the context of active learning. Specifically, it presents an active learning algorithm based on an arbitrary classification-calibrated surrogate loss function, along with an analysis of the number of label requests sufficient for the classifier returned by the algorithm to achieve a given risk under the $0$-$1$ loss. Interestingly, these results cannot be obtained by simply optimizing the surrogate risk via active learning to an extent sufficient to provide a guarantee on the $0$-$1$ loss, as is common practice in the analysis of surrogate losses for passive learning. Some of the results have additional implications for the use of surrogate losses in passive learning.

Citation

Download Citation

Steve Hanneke. Liu Yang. "Surrogate losses in passive and active learning." Electron. J. Statist. 13 (2) 4646 - 4708, 2019. https://doi.org/10.1214/19-EJS1635

Information

Received: 1 June 2018; Published: 2019
First available in Project Euclid: 13 November 2019

zbMATH: 07136627
MathSciNet: MR4030368
Digital Object Identifier: 10.1214/19-EJS1635

Subjects:
Primary: 62H30 , 62L05 , 68Q32 , 68T05
Secondary: 62G99 , 68Q10 , 68Q25 , 68T10 , 68W40

Keywords: Active learning , ‎classification‎ , selective sampling , sequential design , statistical learning theory , surrogate loss functions

Vol.13 • No. 2 • 2019
Back to Top