Abstract and Applied Analysis

Policy Iteration for Continuous-Time Average Reward Markov Decision Processes in Polish Spaces

Quanxin Zhu, Xinsong Yang, and Chuangxia Huang

Full-text: Open access

Abstract

We study the policy iteration algorithm (PIA) for continuous-time jump Markov decision processes in general state and action spaces. The corresponding transition rates are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. The criterion that we are concerned with is expected average reward. We propose a set of conditions under which we first establish the average reward optimality equation and present the PIA. Then under two slightly different sets of conditions we show that the PIA yields the optimal (maximum) reward, an average optimal stationary policy, and a solution to the average reward optimality equation.

Article information

Source
Abstr. Appl. Anal., Volume 2009 (2009), Article ID 103723, 17 pages.

Dates
First available in Project Euclid: 16 March 2010

Permanent link to this document
https://projecteuclid.org/euclid.aaa/1268745624

Digital Object Identifier
doi:10.1155/2009/103723

Mathematical Reviews number (MathSciNet)
MR2581137

Zentralblatt MATH identifier
1192.90243

Citation

Zhu, Quanxin; Yang, Xinsong; Huang, Chuangxia. Policy Iteration for Continuous-Time Average Reward Markov Decision Processes in Polish Spaces. Abstr. Appl. Anal. 2009 (2009), Article ID 103723, 17 pages. doi:10.1155/2009/103723. https://projecteuclid.org/euclid.aaa/1268745624


Export citation

References

  • R. A. Howard, Dynamic Programming and Markov Processes, The Technology Press of M.I.T., Cambridge, Mass, USA, 1960.
  • R. Dekker, ``Counter examples for compact action Markov decision chains with average reward criteria,'' Communications in Statistics, vol. 3, no. 3, pp. 357--368, 1987.
  • M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics, John Wiley & Sons, New York, NY, USA, 1994.
  • P. J. Schweitzer, ``On undiscounted Markovian decision processes with compact action spaces,'' RAIRO---Operations Research, vol. 19, no. 1, pp. 71--86, 1985.
  • E. V. Denardo and B. L. Fox, ``Multichain Markov renewal programs,'' SIAM Journal on Applied Mathematics, vol. 16, pp. 468--487, 1968.
  • X. P. Guo and O. Hernández-Lerma, ``Drift and monotonicity conditions for continuous-time controlled Markov chains with an average criterion,'' IEEE Transactions on Automatic Control, vol. 48, no. 2, pp. 236--245, 2003.
  • X. P. Guo and X. R. Cao, ``Optimal control of ergodic continuous-time Markov chains with average sample-path rewards,'' SIAM Journal on Control and Optimization, vol. 44, no. 1, pp. 29--48, 2005.
  • O. Hernández-Lerma and J. B. Lasserre, Further Topics on Discrete-Time Markov Control Processes, vol. 42 of Applications of Mathematics, Springer, New York, NY, USA, 1999.
  • O. Hernández-Lerma and J. B. Lasserre, ``Policy iteration for average cost Markov control processes on Borel spaces,'' Acta Applicandae Mathematicae, vol. 47, no. 2, pp. 125--154, 1997.
  • A. Hordijk and M. L. Puterman, ``On the convergence of policy iteration in finite state undiscounted Markov decision processes: the unichain case,'' Mathematics of Operations Research, vol. 12, no. 1, pp. 163--176, 1987.
  • J. B. Lasserre, ``A new policy iteration scheme for Markov decision processes using Schweitzer's formula,'' Journal of Applied Probability, vol. 31, no. 1, pp. 268--273, 1994.
  • S. P. Meyn, ``The policy iteration algorithm for average reward Markov decision processes with general state space,'' IEEE Transactions on Automatic Control, vol. 42, no. 12, pp. 1663--1680, 1997.
  • M. S. Santos and J. Rust, ``Convergence properties of policy iteration,'' SIAM Journal on Control and Optimization, vol. 42, no. 6, pp. 2094--2115, 2004.
  • Q. X. Zhu, ``Average optimality for continuous-time Markov decision processes with a policy iteration approach,'' Journal of Mathematical Analysis and Applications, vol. 339, no. 1, pp. 691--704, 2008.
  • A. Y. Golubin, ``A note on the convergence of policy iteration in Markov decision processes with compact action spaces,'' Mathematics of Operations Research, vol. 28, no. 1, pp. 194--200, 2003.
  • X. P. Guo and U. Rieder, ``Average optimality for continuous-time Markov decision processes in Polish spaces,'' The Annals of Applied Probability, vol. 16, no. 2, pp. 730--756, 2006.
  • Q. X. Zhu, ``Average optimality inequality for continuous-time Markov decision processes in Polish spaces,'' Mathematical Methods of Operations Research, vol. 66, no. 2, pp. 299--313, 2007.
  • Q. X. Zhu and T. Prieto-Rumeau, ``Bias and overtaking optimality for continuous-time jump Markov decision processes in Polish spaces,'' Journal of Applied Probability, vol. 45, no. 2, pp. 417--429, 2008.
  • R. B. Lund, S. P. Meyn, and R. L. Tweedie, ``Computable exponential convergence rates for stochastically ordered Markov processes,'' The Annals of Applied Probability, vol. 6, no. 1, pp. 218--237, 1996.
  • I. I. Gīhman and A. V. Skorohod, Controlled Stochastic Processes, Springer, New York, NY, USA, 1979.
  • Q. X. Zhu and X. P. Guo, ``Markov decision processes with variance minimization: a new condition and approach,'' Stochastic Analysis and Applications, vol. 25, no. 3, pp. 577--592, 2007.
  • Q. X. Zhu and X. P. Guo, ``Another set of conditions for Markov decision processes with average sample-path costs,'' Journal of Mathematical Analysis and Applications, vol. 322, no. 2, pp. 1199--1214, 2006.
  • Q. X. Zhu and X. P. Guo, ``Another set of conditions for strong $n(n=-1,0)$ discount optimality in Markov decision processes,'' Stochastic Analysis and Applications, vol. 23, no. 5, pp. 953--974, 2005.
  • M. Schäl, ``Conditions for optimality in dynamic programming and for the limit of $n$-stage optimal policies to be optimal,'' Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, vol. 32, no. 3, pp. 179--196, 1975.