Bernoulli

  • Bernoulli
  • Volume 21, Number 4 (2015), 2289-2307.

Adaptive-treed bandits

Adam D. Bull

Full-text: Open access

Abstract

We describe a novel algorithm for noisy global optimisation and continuum-armed bandits, with good convergence properties over any continuous reward function having finitely many polynomial maxima. Over such functions, our algorithm achieves square-root regret in bandits, and inverse-square-root error in optimisation, without prior information.

Our algorithm works by reducing these problems to tree-armed bandits, and we also provide new results in this setting. We show it is possible to adaptively combine multiple trees so as to minimise the regret, and also give near-matching lower bounds on the regret in terms of the zooming dimension.

Article information

Source
Bernoulli, Volume 21, Number 4 (2015), 2289-2307.

Dates
Received: February 2013
Revised: February 2014
First available in Project Euclid: 5 August 2015

Permanent link to this document
https://projecteuclid.org/euclid.bj/1438777594

Digital Object Identifier
doi:10.3150/14-BEJ644

Mathematical Reviews number (MathSciNet)
MR3378467

Zentralblatt MATH identifier
1364.90269

Keywords
bandits on taxonomies continuum-armed bandits noisy global optimisation tree-armed bandits zooming dimension

Citation

Bull, Adam D. Adaptive-treed bandits. Bernoulli 21 (2015), no. 4, 2289--2307. doi:10.3150/14-BEJ644. https://projecteuclid.org/euclid.bj/1438777594


Export citation

References

  • [1] Agrawal, R. (1995). The continuum-armed bandit problem. SIAM J. Control Optim. 33 1926–1951.
  • [2] Auer, P., Cesa-Bianchi, N. and Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47 235–256.
  • [3] Auer, P., Ortner, R. and Szepesvári, C. (2007). Improved rates for the stochastic continuum-armed bandit problem. In Learning Theory. Lecture Notes in Computer Science 4539 454–468. Berlin: Springer.
  • [4] Bubeck, S. (2010). Jeux de bandits et fondations du clustering. Ph.D. thesis, Univ. Lille 1.
  • [5] Bubeck, S. and Cesa-Bianchi, N. (2012). Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found. Trends Mach. Learn. 5 1–122.
  • [6] Bubeck, S., Munos, R. and Stoltz, G. (2009). Pure exploration in multi-armed bandits problems. In Algorithmic Learning Theory. Lecture Notes in Computer Science 5809 23–37. Berlin: Springer.
  • [7] Bubeck, S., Munos, R., Stoltz, G. and Szepesvári, C. (2011). $\mathscr{X}$-armed bandits. J. Mach. Learn. Res. 12 1655–1695.
  • [8] Bubeck, S., Stoltz, G. and Yu, J. (2011). Lipschitz bandits without the Lipschitz constant. In Algorithmic Learning Theory 22 144–158. New York: Springer.
  • [9] Bull, A.D. (2014). Supplement to “Adaptive-treed bandits.” DOI:10.3150/14-BEJ644SUPP.
  • [10] Cope, E.W. (2009). Regret and convergence bounds for a class of continuum-armed bandit problems. IEEE Trans. Automat. Control 54 1243–1253.
  • [11] Frazier, P., Powell, W. and Dayanik, S. (2009). The knowledge-gradient policy for correlated normal beliefs. INFORMS J. Comput. 21 599–613.
  • [12] Gelly, S., Kocsis, L., Schoenauer, M., Sebag, M., Silver, D., Szepesvári, C. and Teytaud, O. (2012). The grand challenge of computer Go: Monte Carlo tree search and extensions. Comm. ACM 55 106–113.
  • [13] Huang, D., Allen, T.T., Notz, W.I. and Miller, R.A. (2006). Sequential kriging optimization using multiple-fidelity evaluations. Struct. Multidiscip. Optim. 32 369–382.
  • [14] Kiefer, J. and Wolfowitz, J. (1952). Stochastic estimation of the maximum of a regression function. Ann. Math. Stat. 23 462–466.
  • [15] Kleinberg, R., Slivkins, A. and Upfal, E. (2008). Multi-armed bandits in metric spaces. In STOC’08 681–690. New York: ACM.
  • [16] Kleinberg, R.D. (2005). Nearly tight bounds for the continuum-armed bandit problem. In Advances in Neural Information Processing Systems 17 697–704. Cambridge, MA: MIT Press.
  • [17] Müller, H.-G. (1985). Kernel estimators of zeros and of location and size of extrema of regression functions. Scand. J. Stat. 12 221–232.
  • [18] Munos, R. (2011). Optimistic optimization of a deterministic function without the knowledge of its smoothness. In Advances in Neural Information Processing Systems 24 783–791. Cambridge, MA: MIT Press.
  • [19] Parsopoulos, K.E. and Vrahatis, M.N. (2002). Recent approaches to global optimization problems through particle swarm optimization. Nat. Comput. 1 235–306.
  • [20] Slivkins, A. (2011). Multi-armed bandits on implicit metric spaces. In Advances in Neural Information Processing Systems 24 1602–1610. Cambridge, MA: MIT Press.
  • [21] Srinivas, N., Krause, A., Kakade, S.M. and Seeger, M. (2010). Gaussian process optimization in the bandit setting: No regret and experimental design. In Proceedings of the 27th International Conference on Machine Learning (ICML-10).
  • [22] Valko, M., Carpentier, A. and Munos, R. (2013). Stochastic simultaneous optimistic optimization. In Proceedings of the 30th International Conference on Machine Learning (ICML-13) 19–27.
  • [23] Yu, J.Y. and Mannor, S. (2011). Unimodal bandits. In Proceedings of the 28th International Conference on Machine Learning (ICML-11).

Supplemental materials