February 2024 Transfer learning for contextual multi-armed bandits
Changxiao Cai, T. Tony Cai, Hongzhe Li
Author Affiliations +
Ann. Statist. 52(1): 207-232 (February 2024). DOI: 10.1214/23-AOS2341


Motivated by a range of applications, we study in this paper the problem of transfer learning for nonparametric contextual multi-armed bandits under the covariate shift model, where we have data collected from source bandits before the start of the target bandit learning. The minimax rate of convergence for the cumulative regret is established and a novel transfer learning algorithm that attains the minimax regret is proposed. The results quantify the contribution of the data from the source domains for learning in the target domain in the context of nonparametric contextual multi-armed bandits.

In view of the general impossibility of adaptation to unknown smoothness, we develop a data-driven algorithm that achieves near-optimal statistical guarantees (up to a logarithmic factor) while automatically adapting to the unknown parameters over a large collection of parameter spaces under an additional self-similarity assumption. A simulation study is carried out to illustrate the benefits of utilizing the data from the source domains for learning in the target domain.


Download Citation

Changxiao Cai. T. Tony Cai. Hongzhe Li. "Transfer learning for contextual multi-armed bandits." Ann. Statist. 52 (1) 207 - 232, February 2024. https://doi.org/10.1214/23-AOS2341


Received: 1 November 2022; Revised: 1 November 2023; Published: February 2024
First available in Project Euclid: 7 March 2024

MathSciNet: MR4718413
Digital Object Identifier: 10.1214/23-AOS2341

Primary: 62G08
Secondary: 62L12

Keywords: Adaptivity , Contextual multi-armed bandit , covariate shift , Minimax rate , regret bounds , self-similarity , transfer learning

Rights: Copyright © 2024 Institute of Mathematical Statistics


This article is only available to subscribers.
It is not available for individual sale.

Vol.52 • No. 1 • February 2024
Back to Top