Journal of Applied Probability

Sample-path optimal stationary policies in stable Markov decision chains with the average reward criterion

Rolando Cavazos-Cadena, Raúl Montes-de-Oca, and Karel Sladký

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

This paper concerns discrete-time Markov decision chains with denumerable state and compact action sets. Besides standard continuity requirements, the main assumption on the model is that it admits a Lyapunov function ℓ. In this context the average reward criterion is analyzed from the sample-path point of view. The main conclusion is that if the expected average reward associated to ℓ 2 is finite under any policy then a stationary policy obtained from the optimality equation in the standard way is sample-path average optimal in a strong sense.

Article information

Source
J. Appl. Probab., Volume 52, Number 2 (2015), 419-440.

Dates
First available in Project Euclid: 23 July 2015

Permanent link to this document
https://projecteuclid.org/euclid.jap/1437658607

Digital Object Identifier
doi:10.1239/jap/1437658607

Mathematical Reviews number (MathSciNet)
MR3372084

Zentralblatt MATH identifier
1327.90366

Subjects
Primary: 90C40: Markov and semi-Markov decision processes
Secondary: 93E20: Optimal stochastic control 60J05: Discrete-time Markov processes on general state spaces

Keywords
Dominated convergence theorem for the expected average criterion discrepancy function Kolmogorov inequality innovations strong sample-path optimality

Citation

Cavazos-Cadena, Rolando; Montes-de-Oca, Raúl; Sladký, Karel. Sample-path optimal stationary policies in stable Markov decision chains with the average reward criterion. J. Appl. Probab. 52 (2015), no. 2, 419--440. doi:10.1239/jap/1437658607. https://projecteuclid.org/euclid.jap/1437658607


Export citation

References

  • Arapostathis, A. \et (1993). Discrete-time controlled Markov processes with average cost criterion: a survey. SIAM J. Control Optimization 31, 282–344.
  • Ash, R. B. (1972). Real Analysis and Probability. Academic Press, New York.
  • Bäuerle, N. and Rieder, U. (2010). Markov decision processes. Jahresber. Dtsch. Math.-Ver. 112, 217–243.
  • Bäuerle, N. and Rieder, U. (2011). Markov Decision Processes with Applications to Finance. Springer, Heidelberg.
  • Billingsley, P. (1995). Probability and Measure, 3rd edn. John Wiley, New York.
  • Borkar, V. S. (1984). On minimum cost per unit of time control of Markov chains. SIAM J. Control Optimization 22, 965–978.
  • Borkar, V. S. (1991). Topics in Controlled Markov Chains. Longman Scientific and Technical, Harlow.
  • Cavazos-Cadena, R. (1988). Necessary and sufficient conditions for a bounded solution to the optimality equation in average reward Markov decision chains. Systems Control Lett. 10, 71–78.
  • Cavazos-Cadena, R. (1989). Necessary conditions for the optimality equation in average-reward Markov decision processes. Appl. Math. Optimization 19, 97–112.
  • Cavazos-Cadena, R. and Fernández-Gaucherand, E. (1995). Denumerable controlled Markov chains with average reward criterion: sample path optimality. Math. Meth. Operat. Res. 41, 89–108.
  • Cavazos-Cadena, R. and Hernández-Lerma, O. (1992). Equivalence of Lyapunov stability criteria in a class of Markov decision processes. Appl. Math. Optimization 26, 113–137.
  • Cavazos-Cadena, R. and Montes-de-Oca, R. (2012). Sample-path optimality in average Markov decision chains under a double Lyapunov function condition. In Optimization, Control, and Applications of Stochastic Systems. Springer, New York, pp. 31–57.
  • Cavazos-Cadena, R., Montes-de-Oca, R. and Sladký, K. (2014). A counterexample on sample-path optimality in stable Markov decision chains with the average reward criterion. J. Optimization Theory Appl. 163, 674–684.
  • Dai Pra, P., Di Masi, G. B. and Trivellato, B. (1999). Almost sure optimality and optimality in probability for stochastic control problems over an infinite time horizon. Ann. Operat. Res. 88, 161–171.
  • Foster, F. G. (1953). On the stochastic matrices associated with certain queueing processes. Ann. Math. Statist. 24, 355–360.
  • Hernández-Lerma, O. (1989). Adaptive Markov Control Processes. Springer, New York.
  • Hernández-Lerma, O., Vega-Amaya, O. and Carrasco, G. (1999). Sample-path optimality and variance-minimization of average cost Markov control processes. SIAM J. Control Optimization 38, 79–93.
  • Hordijk, A. (1974). Dynamic Programming and Markov Potential Theory (Math. Centre Tracts 51). Mathematisch Centrum, Amsterdam.
  • Hunt, F. Y. (2005). Sample path optimality for a Markov optimization problems. Stoch. Process. Appl. 115, 769–779.
  • Lasserre, J. B. (1999). Sample-path average optimality for Markov control processes. IEEE Trans. Automatic Control 44, 1966–1971.
  • Montes-de-Oca, R. and Hernández-Lerma, O. (1996). Value iteration in average cost Markov control processes on Borel spaces. Acta Appl. Math. 42, 203–222.
  • Puterman, M. L. (1994). Markov Decision Processes. John Wiley, New York.
  • Sennott, L. I. (1999). Stochastic Dynamic Programming and the Control of Queueing Systems. John Wiley, New York.
  • Thomas, L. C. (1980). Connectedness conditions for denumerable state Markov decision processes. In Recent Developments in Markov Decision Processes. Academic Press, New York, pp. 181–204.
  • Vega-Amaya, O. (1999). Sample path average optimality of Markov control processes with strictly unbounded cost. Appl. Math. (Warsaw) 26, 363–381.
  • Zhu, Q. and Guo, X. (2006). Another set of conditions for Markov decision processes with average sample-path costs. J. Math. Anal. Appl. 322, 1199–1214. \endharvreferences