The Annals of Applied Probability

Discounted continuous-time constrained Markov decision processes in Polish spaces

Xianping Guo and Xinyuan Song

Full-text: Open access

Abstract

This paper is devoted to studying constrained continuous-time Markov decision processes (MDPs) in the class of randomized policies depending on state histories. The transition rates may be unbounded, the reward and costs are admitted to be unbounded from above and from below, and the state and action spaces are Polish spaces. The optimality criterion to be maximized is the expected discounted rewards, and the constraints can be imposed on the expected discounted costs. First, we give conditions for the nonexplosion of underlying processes and the finiteness of the expected discounted rewards/costs. Second, using a technique of occupation measures, we prove that the constrained optimality of continuous-time MDPs can be transformed to an equivalent (optimality) problem over a class of probability measures. Based on the equivalent problem and a so-called w̄-weak convergence of probability measures developed in this paper, we show the existence of a constrained optimal policy. Third, by providing a linear programming formulation of the equivalent problem, we show the solvability of constrained optimal policies. Finally, we use two computable examples to illustrate our main results.

Article information

Source
Ann. Appl. Probab., Volume 21, Number 5 (2011), 2016-2049.

Dates
First available in Project Euclid: 25 October 2011

Permanent link to this document
https://projecteuclid.org/euclid.aoap/1319576616

Digital Object Identifier
doi:10.1214/10-AAP749

Mathematical Reviews number (MathSciNet)
MR2884058

Zentralblatt MATH identifier
1258.90104

Subjects
Primary: 90C40: Markov and semi-Markov decision processes 60J27: Continuous-time Markov processes on discrete state spaces

Keywords
Continuous-time Markov decision process unbounded transition rates occupation measure linear programming formulation constrained optimal policy

Citation

Guo, Xianping; Song, Xinyuan. Discounted continuous-time constrained Markov decision processes in Polish spaces. Ann. Appl. Probab. 21 (2011), no. 5, 2016--2049. doi:10.1214/10-AAP749. https://projecteuclid.org/euclid.aoap/1319576616


Export citation

References

  • [1] Altman, E. (1999). Constrained Markov Decision Processes. Chapman and Hall/CRC, Boca Raton, FL.
  • [2] Altman, E. and Shwartz, A. (1991). Markov decision problems and state-action frequencies. SIAM J. Control Optim. 29 786–809.
  • [3] Ash, R. B. (2000). Probability and Measure Theory, 2nd ed. Academic Press, Burlington, MA.
  • [4] Bertsekas, D. P. and Shreve, A. (1996). Stochastic Optimal Control: The Case of Discrete-Time Case. Athena Scientific, Belmont, MA.
  • [5] Chen, M.-F. (2004). From Markov Chains to Non-equilibrium Particle Systems, 2nd ed. World Scientific, River Edge, NJ.
  • [6] Chen, R. C. and Feinberg, E. A. (2007). Non-randomized policies for constrained Markov decision processes. Math. Methods Oper. Res. 66 165–179.
  • [7] Feinberg, E. A. (2000). Constrained discounted Markov decision processes and Hamiltonian cycles. Math. Oper. Res. 25 130–140.
  • [8] Feinberg, E. A. and Shwartz, A. (1995). Constrained Markov decision models with weighted discounted rewards. Math. Oper. Res. 20 302–320.
  • [9] Feinberg, E. A. and Shwartz, A. (1996). Constrained discounted dynamic programming. Math. Oper. Res. 21 922–945.
  • [10] Feinberg, E. A. and Shwartz, A. (1999). Constrained dynamic programming with two discount factors: Applications and an algorithm. IEEE Trans. Automat. Control 44 628–631.
  • [11] Guo, X. P. (2007). Constrained optimization for average cost continuous-time Markov decision processes. IEEE Trans. Automat. Control 52 1139–1143.
  • [12] Guo, X. P. (2007). Continuous-time Markov decision processes with discounted rewards: The case of Polish spaces. Math. Oper. Res. 32 73–87.
  • [13] Guo, X. P. and Hernández-Lerma, O. (2003). Constrained continuous-time Markov control processes with discounted criteria. Stochastic Anal. Appl. 21 379–399.
  • [14] Guo, X. P. and Hernández-Lerma, O. (2003). Continuous-time controlled Markov chains. Ann. Appl. Probab. 13 363–388.
  • [15] Guo, X. P. and Hernández-Lerma, O. (2009). Continuous-time Markov Decision Processes: Theory and Applications. Stochastic Modelling and Applied Probability 62. Springer, Berlin.
  • [16] Guo, X. P. and Rieder, U. (2006). Average optimality for continuous-time Markov decision processes in Polish spaces. Ann. Appl. Probab. 16 730–756.
  • [17] Guo, X. P., Hernández-Lerma, O. and Prieto-Rumeau, T. (2006). A survey of resent results on continuous-time Markov decision processes. TOP 14 177–246.
  • [18] Haviv, M. and Puterman, M. L. (1998). Bias optimality in controlled queueing systems. J. Appl. Probab. 35 136–150.
  • [19] Hernández-Lerma, O. and González-Hernández, J. (2000). Constrained Markov control processes in Borel spaces: The discounted case. Math. Methods Oper. Res. 52 271–285.
  • [20] Hernández-Lerma, O., González-Hernández, J. and López-Martínez, R. R. (2003). Constrained average cost Markov control processes in Borel spaces. SIAM J. Control Optim. 42 442–468 (electronic).
  • [21] Hernández-Lerma, O. and Lasserre, J. B. (1996). Discrete-time Markov Control Processes: Basic Optimality Criteria. Applications of Mathematics (New York) 30. Springer, New York.
  • [22] Hernández-Lerma, O. and Lasserre, J. B. (1999). Further Topics on Discrete-time Markov Control Processes. Applications of Mathematics (New York) 42. Springer, New York.
  • [23] Hordijk, A. and Spieksma, F. (1989). Constrained admission control to a queueing system. Adv. in Appl. Probab. 21 409–431.
  • [24] Jacod, J. (1975). Multivariate point processes: Predictable projection, Radon–Nikodým derivatives, representation of martingales. Z. Wahrsch. Verw. Gebiete 31 235–253.
  • [25] Kadota, Y., Kurano, M. and Yasuda, M. (2006). Discounted Markov decision processes with utility constraints. Comput. Math. Appl. 51 279–284.
  • [26] Kakumanu, P. (1971). Continuously discounted Markov decision model with countable state and action space. Ann. Math. Statist. 42 919–926.
  • [27] Kitaev, M. Y. (1985). Semi-Markov and Jump Markov controlled models: Average cost criterion. Theory Probab. Appl. 30 272–288.
  • [28] Kitaev, M. Y. and Rykov, V. V. (1995). Controlled Queueing Systems. CRC Press, Boca Raton, FL.
  • [29] Kurano, M., Nakagami, J.-I. and Huang, Y. (2000). Constrained Markov decision processes with compact state and action spaces: The average case. Optimization 48 255–269.
  • [30] Lewis, M. E. and Puterman, M. L. (2001). A probabilistic analysis of bias optimality in unichain Markov decision processes. IEEE Trans. Automat. Control 46 96–100.
  • [31] Lund, R. B., Meyn, S. P. and Tweedie, R. L. (1996). Computable exponential convergence rates for stochastically ordered Markov processes. Ann. Appl. Probab. 6 218–237.
  • [32] Phelps, R. R. (2001). Lectures on Choquet’s Theorem, 2nd ed. Lecture Notes in Math. 1757. Springer, Berlin.
  • [33] Piunovskiy, A. B. (1997). Optimal Control of Random Sequences in Problems with Constraints. Mathematics and Its Applications 410. Kluwer, Dordrecht.
  • [34] Piunovskiy, A. B. (1998). A controlled jump discounted model with constraints. Theory Probab. Appl. 42 51–72.
  • [35] Piunovskiy, A. B. (2005). Discounted Continuous Time Markov Decision Processes: The Convex Analytic Approach. 16th Triennial IFAC World Congress, Czech Republic, Praha.
  • [36] Prieto-Rumeau, T. and Hernández-Lerma, O. (2008). Ergodic control of continuous-time Markov chains with pathwise constraints. SIAM J. Control Optim. 47 1888–1908.
  • [37] Puterman, M. L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York.
  • [38] Sennott, L. I. (1991). Constrained discounted Markov decision chains. Probab. Engrg. Inform. Sci. 5 463–475.
  • [39] Sennott, L. I. (1999). Stochastic Dynamic Programming and the Control of Queueing Systems. Wiley, New York.
  • [40] Yushkevich, A. A. (1977). Controlled Markov models with contable states and continous time. Theory Probab. Appl. 22 215–7235.
  • [41] Zadorojniy, A. and Shwartz, A. (2006). Robustness of policies in constrained Markov decision processes. IEEE Trans. Automat. Control 51 635–638.
  • [42] Zhang, L. L. and Guo, X. P. (2008). Constrained continuous-time Markov decision processes with average criteria. Math. Methods Oper. Res. 67 323–340.