## Bernoulli

• Bernoulli
• Volume 23, Number 3 (2017), 1694-1736.

### Constrained total undiscounted continuous-time Markov decision processes

#### Abstract

The present paper considers the constrained optimal control problem with total undiscounted criteria for a continuous-time Markov decision process (CTMDP) in Borel state and action spaces. The cost rates are nonnegative. Under the standard compactness and continuity conditions, we show the existence of an optimal stationary policy out of the class of general nonstationary ones. In the process, we justify the reduction of the CTMDP model to a discrete-time Markov decision process (DTMDP) model based on the studies of the undiscounted occupancy and occupation measures. We allow that the controlled process is not necessarily absorbing, and the transition rates are not necessarily separated from zero, and can be arbitrarily unbounded; these features count for the main technical difficulties in studying undiscounted CTMDP models.

#### Article information

Source
Bernoulli, Volume 23, Number 3 (2017), 1694-1736.

Dates
Revised: September 2015
First available in Project Euclid: 17 March 2017

https://projecteuclid.org/euclid.bj/1489737622

Digital Object Identifier
doi:10.3150/15-BEJ793

Mathematical Reviews number (MathSciNet)
MR3624875

Zentralblatt MATH identifier
06714316

#### Citation

Guo, Xianping; Zhang, Yi. Constrained total undiscounted continuous-time Markov decision processes. Bernoulli 23 (2017), no. 3, 1694--1736. doi:10.3150/15-BEJ793. https://projecteuclid.org/euclid.bj/1489737622

#### References

• [1] Aliprantis, C. and Border, K. (2007). Infinite Dimensional Analysis. New York: Springer.
• [2] Altman, E. (1999). Constrained Markov Decision Processes. Stochastic Modeling. Boca Raton, FL: Chapman & Hall/CRC.
• [3] Bäuerle, N. and Rieder, U. (2009). MDP algorithms for portfolio optimization problems in pure jump markets. Finance Stoch. 13 591–611.
• [4] Bäuerle, N. and Rieder, U. (2011). Markov Decision Processes with Applications to Finance. Heidelberg: Springer.
• [5] Bertsekas, D.P. and Shreve, S.E. (1978). Stochastic Optimal Control: The Discrete Time Case. New York: Academic Press.
• [6] Borkar, V.S. (2002). Convex analytic methods in Markov decision processes. In Handbook of Markov Decision Processes. Internat. Ser. Oper. Res. Management Sci. 40 347–375. Boston, MA: Kluwer Academic.
• [7] Costa, O.L.d.V. and Dufour, F. (2013). Continuous Average Control of Piecewise Deterministic Markov Processes. Springer Briefs in Mathematics. New York: Springer.
• [8] Davis, M.H.A. (1993). Markov Models and Optimization. Monographs on Statistics and Applied Probability 49. London: Chapman & Hall.
• [9] Derman, C. and Strauch, R.E. (1966). A note on memoryless rules for controlling sequential control processes. Ann. Math. Stat. 37 276–278.
• [10] Dufour, F., Horiguchi, M. and Piunovskiy, A.B. (2012). The expected total cost criterion for Markov decision processes under constraints: A convex analytic approach. Adv. in Appl. Probab. 44 774–793.
• [11] Dufour, F. and Piunovskiy, A.B. (2010). Multiobjective stopping problem for discrete-time Markov processes: Convex analytic approach. J. Appl. Probab. 47 947–966.
• [12] Dufour, F. and Piunovskiy, A.B. (2013). The expected total cost criterion for Markov decision processes under constraints. Adv. in Appl. Probab. 45 837–859.
• [13] Feinberg, E.A. (2004). Continuous time discounted jump Markov decision processes: A discrete-event approach. Math. Oper. Res. 29 492–524.
• [14] Feinberg, E.A. (2012). Reduction of discounted continuous-time MDPs with unbounded jump and reward rates to discrete-time total-reward MDPs. In Optimization, Control, and Applications of Stochastic Systems. Systems Control Found. Appl. 77–97. Springer, New York. Birkhäuser.
• [15] Feinberg, E.A., Kasyanov, P.O. and Zadoianchuk, N.V. (2013). Berge’s theorem for noncompact image sets. J. Math. Anal. Appl. 397 255–259.
• [16] Feinberg, E.A., Mandava, M. and Shiryaev, A.N. (2014). On solutions of Kolmogorov’s equations for nonhomogeneous jump Markov processes. J. Math. Anal. Appl. 411 261–270.
• [17] Feinberg, E.A. and Rothblum, U.G. (2012). Splitting randomized stationary policies in total-reward Markov decision processes. Math. Oper. Res. 37 129–153.
• [18] Forwick, L., Schäl, M. and Schmitz, M. (2004). Piecewise deterministic Markov control processes with feedback controls and unbounded costs. Acta Appl. Math. 82 239–267.
• [19] Guo, X. (2007). Continuous-time Markov decision processes with discounted rewards: The case of Polish spaces. Math. Oper. Res. 32 73–87.
• [20] Guo, X. and Hernández-Lerma, O. (2009). Continuous-Time Markov Decision Processes: Theory and Applications. Stochastic Modelling and Applied Probability 62. Berlin: Springer.
• [21] Guo, X. and Piunovskiy, A. (2011). Discounted continuous-time Markov decision processes with constraints: Unbounded transition and loss rates. Math. Oper. Res. 36 105–132.
• [22] Guo, X., Vykertas, M. and Zhang, Y. (2013). Absorbing continuous-time Markov decision processes with total cost criteria. Adv. in Appl. Probab. 45 490–519.
• [23] Hernández-Lerma, O. and Lasserre, J.B. (1996). Discrete-Time Markov Control Processes. New York: Springer.
• [24] Hernández-Lerma, O. and Lasserre, J.B. (2000). Fatou’s lemma and Lebesgue’s convergence theorem for measures. J. Appl. Math. Stoch. Anal. 13 137–146.
• [25] Jacod, J. (1975). Multivariate point processes: Predictable projection, Radon–Nikodým derivatives, representation of martingales. Z. Wahrsch. Verw. Gebiete 31 235–253.
• [26] Kitaev, M.Yu. (1985). Semi-Markov and jump Markov controllable models. Average cost criterion. Theory Probab. Appl. 30 272–288.
• [27] Kitaev, M.Yu. and Rykov, V.V. (1995). Controlled Queueing Systems. Boca Raton, FL: CRC Press.
• [28] Lippman, S.A. (1975). Applying a new device in the optimization of exponential queuing systems. Oper. Res. 23 687–710.
• [29] Piunovskiĭ, A. (1998). A controlled discounted jump model under constraints. Theory Probab. Appl. 42 51–71.
• [30] Piunovskiy, A. and Zhang, Y. (2011). Discounted continuous-time Markov decision processes with unbounded rates: The convex analytic approach. SIAM J. Control Optim. 49 2032–2061.
• [31] Piunovskiy, A. and Zhang, Y. (2012). The transformation method for continuous-time Markov decision processes. J. Optim. Theory Appl. 154 691–712.
• [32] Piunovskiy, A.B. (1997). Optimal Control of Random Sequences in Problems with Constraints. Dordrecht: Kluwer Academic.
• [33] Prieto-Rumeau, T. and Hernández-Lerma, O. (2012). Selected Topics on Continuous-Time Controlled Markov Chains and Markov Games. London: Imperial College Press.
• [34] Puterman, M.L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. New York: Wiley.
• [35] Schäl, M. (1975). Conditions for optimality in dynamic programming and for the limit of $n$-stage optimal policies to be optimal. Z. Wahrsch. Verw. Gebiete 32 179–196.
• [36] Serfozo, R.F. (1979). An equivalence between continuous and discrete time Markov decision processes. Oper. Res. 27 616–620.
• [37] Yushkevich, A. (1980). On reducing a jump controllable Markov model to a model with discrete time. Theory Probab. Appl. 25 58–68.