Annals of Applied Probability

Average optimality for continuous-time Markov decision processes in Polish spaces

Xianping Guo and Ulrich Rieder

Full-text: Open access

Abstract

This paper is devoted to studying the average optimality in continuous-time Markov decision processes with fairly general state and action spaces. The criterion to be maximized is expected average rewards. The transition rates of underlying continuous-time jump Markov processes are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. We first provide two optimality inequalities with opposed directions, and also give suitable conditions under which the existence of solutions to the two optimality inequalities is ensured. Then, from the two optimality inequalities we prove the existence of optimal (deterministic) stationary policies by using the Dynkin formula. Moreover, we present a “semimartingale characterization” of an optimal stationary policy. Finally, we use a generalized Potlach process with control to illustrate the difference between our conditions and those in the previous literature, and then further apply our results to average optimal control problems of generalized birth–death systems, upwardly skip-free processes and two queueing systems. The approach developed in this paper is slightly different from the “optimality inequality approach” widely used in the previous literature.

Article information

Source
Ann. Appl. Probab., Volume 16, Number 2 (2006), 730-756.

Dates
First available in Project Euclid: 29 June 2006

Permanent link to this document
https://projecteuclid.org/euclid.aoap/1151592249

Digital Object Identifier
doi:10.1214/105051606000000105

Mathematical Reviews number (MathSciNet)
MR2244431

Zentralblatt MATH identifier
1160.90010

Subjects
Primary: 90C40: Markov and semi-Markov decision processes 93E20: Optimal stochastic control

Keywords
Average reward general state space optimality inequality optimal stationary policy semimartingale characterization

Citation

Guo, Xianping; Rieder, Ulrich. Average optimality for continuous-time Markov decision processes in Polish spaces. Ann. Appl. Probab. 16 (2006), no. 2, 730--756. doi:10.1214/105051606000000105. https://projecteuclid.org/euclid.aoap/1151592249


Export citation

References

  • Anderson, W. J. (1991). Continuous-Time Markov Chains. Springer, New York.
  • Bokar, V. (1989). Optimal Control of Diffusion Processes. Longman Sci. Tech., Harlow.
  • Chen, M. F. (2000). Equivalence of exponential ergodicity and $L^2$-exponential convergence for Markov chains. Stochastic Process. Appl. 87 281–279.
  • Chen, M. F. (2004). From Markov Chains to Non-Equilibrium Particle Systems, 2nd ed. World Scientific Publishing, River Edge, NJ.
  • Doshi, B. T. (1976). Continuous-time control of Markov processes on an arbitrary state space: Average return criterion. Ann. Statist. 4 1219–1235.
  • Dong, Z. Q. (1979). Continuous time Markov decision programming with average reward criterion-countable state and action space. Sci. Sinica SP ISS(II) 141–148.
  • Down, D., Meyn, S. P. and Tweedie, R. L. (1995). Exponential and uniform ergodicity of Markov processes. Ann. Probab. 23 1671–1691.
  • Feller, W. (1940). On the integro-differential equations of purely discontinuous Markoff processes. Trans. Amer. Math. Soc. 48 488–515.
  • Fleming, W. H. and Soner, H. M. (1993). Controlled Markov Processes and Viscosity Solutions. Springer, New York.
  • Gihman, I. I. and Skorohod, A. V. (1979). Controlled Stochastic Processes. Springer, New York.
  • Guo, X. P. (2003). Continuous-time Markov decison processes with discounted rewards: The case of Polish spaces. Unpublished manuscript.
  • Guo, X. P. and Cao, X.-R. (2005). Optimal control of ergodic continuous-time Markov chains with average sample-path rewards. SIAM J. Control Optim. 44 29–48.
  • Guo, X. P. and Hernández-Lerma, O. (2003). Drift and monotonicity conditions for continuous-time controlled Markov chains an average criterion. IEEE Trans. Automat. Control 48 236–245.
  • Guo, X. P. and Hernández-Lerma, O. (2003). Continuous-time controlled Markov chains. Ann. Appl. Probab. 13 363–388.
  • Guo, X. P. and Hernández-Lerma, O. (2003). Zero-sum games for continuous-time Markov chains with unbounded transition and average payoff rates. J. Appl. Probab. 40 327–345.
  • Guo, X. P. and Liu, K. (2001). A note on optimality conditions for continuous-time Markov decision processes with average cost criterion. IEEE Trans. Automat. Control 46 1984–1989.
  • Guo, X. P. and Zhu, W. P. (2002). Denumerable state continuous time Markov decision processes with unbounded cost and transition rates under average criterion. ANZIAM J. 43 541–557.
  • Haviv, M. and Puterman, M. L. (1998). Bias optimality in controlled queuing systems. J. Appl. Probab. 35 136–150.
  • Hernández-Lerma, O. (1994). Lectures on Continuous-Time Markov Control Processes. Soc. Mat. Mexicana, México.
  • Hernández-Lerma, O. and Lasserre, J. B. (1996). Discrete-Time Markov Control Processes. Springer, New York.
  • Hernández-Lerma, O. and Lasserre, J. B. (1999). Further Topics on Discrete-Time Markov Control Processes. Springer, New York.
  • Holley, R. and Liggett, T. M. (1981). Generalized Potlach and smoothing processes. Z. Wahrsch. Verw. Gebiete 55 165–195.
  • Howard, R. A. (1960). Dynamic Programming and Markov Processes. Wiley, New York.
  • Kakumanu, P. (1972). Nondiscounted continuous-time Markov decision processes with countable state and action spaces. SIAM J. Control 10 210–220.
  • Kitaev, M. Y. and Rykov, V. V. (1995). Controlled Queueing Systems. CRC Press, Boca Raton, FL.
  • Lermbersky, M. R. (1974). On maximal rewards and $\varepsilon$-optimal policies in continuous time Markov chains. Ann. Statist. 2 159–169.
  • Lewis, M. E. and Puterman, M. L. (2000). A note on bias optimality in controlled queueing systems. J. Appl. Probab. 37 300–305.
  • Lund, R. B., Meyn, S. P. and Tweedie, R. L. (1996). Computable exponential convergence rates for stochastically ordered Markov processes. Ann. Appl. Probab. 6 218–237.
  • Meyn, S. P. and Tweedie, R. L. (1993). Stability of Markovian processes III: Foster–Lyapunov criteria for continuous-time processes. Adv. in Appl. Probab. 25 518–548.
  • Miller, R. L. (1968). Finite state continuous time Markov decision processes with an infinite planning horizon. J. Math. Anal. Appl. 22 552–569.
  • Puterman, M. L. (1994). Markov Decision Processes. Wiley, New York.
  • Rao, M. M. (1995). Stochastic Processes: General Theory. Kluwer, Dordrecht.
  • Rieder, U. (1978). Measurable selection theorems for optimization problems. Manuscripta Math. 24 115–131.
  • Sennott, L. I. (1999). Stochastic Dynamic Programming and the Control of Queueing System. Wiley, New York.
  • Song, J. S. (1987). Continuous time Markov decision programming with non-uniformly bounded transition rates. Sciential Sinica 12 1258–1267.
  • Tweedie, R. L. (1981). Criteria for ergodicity, exponential ergodicity and strong ergodicity of Markov processes. J. Appl. Probab. 18 122–130.
  • Widder, D. V. (1941). The Laplace Transform. Princeton Univ. Press.
  • Williams, D. (1979). Diffusions, Markov Processes, and Martingales. Wiley, New York.
  • Yushkevich, A. A. and Feinberg, E. A. (1979). On homogeneous Markov model with continuous time and finite or countable state space. Theory Probab. Appl. 24 156–161.
  • Zeifman, A. I. (1991). Some estimates of the rate of convergence for birth and death processes. J. Appl. Probab. 28 268–277.
  • Zheng, S. H. (1991). Continuous time Markov decision programming with average reward criterion and unbounded reward rates. Acta Math. Appl. Sinica 7 6–16.