Open Access
2017 Learning vs earning trade-off with missing or censored observations: The two-armed Bayesian nonparametric beta-Stacy bandit problem
Stefano Peluso, Antonietta Mira, Pietro Muliere
Electron. J. Statist. 11(2): 3368-3406 (2017). DOI: 10.1214/17-EJS1342

Abstract

Existing Bayesian nonparametric methodologies for bandit problems focus on exact observations, leaving a gap in those bandit applications where censored observations are crucial. We address this gap by extending a Bayesian nonparametric two-armed bandit problem to right-censored data, where each arm is generated from a beta-Stacy process as defined by Walker and Muliere (1997). We first show some properties of the expected advantage of choosing one arm over the other, namely the monotonicity in the arm response and, limited to the case of continuous state space, the continuity in the right-censored arm response. We partially characterize optimal strategies by proving the existence of stay-with-a-winner and stay-with-a-winner/switch-on-a-loser break-even points, under non-restrictive conditions that include the special cases of the simple homogeneous process and the Dirichlet process. Numerical estimations and simulations for a variety of discrete and continuous state space settings are presented to illustrate the performance and flexibility of our framework.

Citation

Download Citation

Stefano Peluso. Antonietta Mira. Pietro Muliere. "Learning vs earning trade-off with missing or censored observations: The two-armed Bayesian nonparametric beta-Stacy bandit problem." Electron. J. Statist. 11 (2) 3368 - 3406, 2017. https://doi.org/10.1214/17-EJS1342

Information

Received: 1 December 2016; Published: 2017
First available in Project Euclid: 6 October 2017

zbMATH: 1377.62033
MathSciNet: MR3709858
Digital Object Identifier: 10.1214/17-EJS1342

Subjects:
Primary: 62C10
Secondary: 62N01

Vol.11 • No. 2 • 2017
Back to Top