We study dynamic allocation problems for discrete time multi-armed bandits under uncertainty, based on the the theory of nonlinear expectations. We show that, under independence assumption on the bandits and with some relaxation in the definition of optimality, a Gittins allocation index gives optimal choices. This involves studying the interaction of our uncertainty with controls which determine the filtration. We also run a simple numerical example which illustrates the interaction between the willingness to explore and uncertainty aversion of the agent when making decisions.
Samuel Cohen thanks the Oxford-Man Institute for research support and acknowledges the support of The Alan Turing Institute under the Engineering and Physical Sciences Research Council grant EP/N510129/1. Tanut Treetanthiploet thanks the University of Oxford for research support while completing this work, and acknowledges the support of the Development and Promotion of Science and Technology Talents Project (DPST) of the Government of Thailand.
"Gittins’ theorem under uncertainty." Electron. J. Probab. 27 1 - 48, 2022. https://doi.org/10.1214/22-EJP742