Advances in Applied Probability

The expected total cost criterion for Markov decision processes under constraints: a convex analytic approach

Fran\c cois Dufour, M. Horiguchi, and A. B. Piunovskiy

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

This paper deals with discrete-time Markov decision processes (MDPs) under constraints where all the objectives have the same form of expected total cost over the infinite time horizon. The existence of an optimal control policy is discussed by using the convex analytic approach. We work under the assumptions that the state and action spaces are general Borel spaces, and that the model is nonnegative, semicontinuous, and there exists an admissible solution with finite cost for the associated linear program. It is worth noting that, in contrast to the classical results in the literature, our hypotheses do not require the MDP to be transient or absorbing. Our first result ensures the existence of an optimal solution to the linear program given by an occupation measure of the process generated by a randomized stationary policy. Moreover, it is shown that this randomized stationary policy provides an optimal solution to this Markov control problem. As a consequence, these results imply that the set of randomized stationary policies is a sufficient set for this optimal control problem. Finally, our last main result states that all optimal solutions of the linear program coincide on a special set with an optimal occupation measure generated by a randomized stationary policy. Several examples are presented to illustrate some theoretical issues and the possible applications of the results developed in the paper.

Article information

Source
Adv. in Appl. Probab., Volume 44, Number 3 (2012), 774-793.

Dates
First available in Project Euclid: 6 September 2012

Permanent link to this document
https://projecteuclid.org/euclid.aap/1346955264

Digital Object Identifier
doi:10.1239/aap/1346955264

Mathematical Reviews number (MathSciNet)
MR3024609

Zentralblatt MATH identifier
1286.90161

Subjects
Primary: 90C40: Markov and semi-Markov decision processes
Secondary: 60J10: Markov chains (discrete-time Markov processes on discrete state spaces) 90C90: Applications of mathematical programming

Keywords
Markov decision process expected total cost criterion constraint linear programming occupation measure

Citation

Dufour, Fran\c cois; Horiguchi, M.; Piunovskiy, A. B. The expected total cost criterion for Markov decision processes under constraints: a convex analytic approach. Adv. in Appl. Probab. 44 (2012), no. 3, 774--793. doi:10.1239/aap/1346955264. https://projecteuclid.org/euclid.aap/1346955264


Export citation

References

  • Altman, E. (1999). Constrained Markov Decision Processes. Chapman & Hall/CRC, Boca Raton, FL.
  • Bäuerle, N. and Rieder, U. (2011). Markov Decision Processes with Applications to Finance. Springer, Heidelberg.
  • Bertsekas, D. P. (1987). Dynamic Programming. Prentice Hall, Englewood Cliffs, NJ.
  • Bertsekas, D. P. and Shreve, S. E. (1978). Stochastic Optimal Control (Math. Sci. Eng. 139). Academic Press, New York.
  • Borkar, V. S. (2002). Convex analytic methods in Markov decision processes. In Handbook of Markov Decision Processes (Internat. Ser. Operat. Res. Manag. 40), Kluwer, Boston, MA, pp. 347–375.
  • Dufour, F. and Piunovskiy, A. B. (2010). Multiobjective stopping problem for discrete-time Markov processes: convex analytic approach. J. Appl. Prob. 47, 947–966.
  • Feinberg, E. A. (2002). Total reward criteria. In Handbook of Markov Decision Processes (Internat. Ser. Operat. Res. Manag. 40), Kluwer, Boston, MA, pp. 173–207.
  • Hernández-Lerma, O. and Lasserre, J. B. (1996). Discrete-Time Markov Control Processes (Appl. Math. 30). Springer, New York.
  • Hernández-Lerma, O. and Lasserre, J. B. (1999). Further Topics on Discrete-Time Markov Control Processes (Appl. Math. 42). Springer, New York.
  • Horiguchi, M. (2001). Markov decision processes with a stopping time constraint. Math. Meth. Operat. Res. 53, 279–295.
  • Horiguchi, M. (2001). Stopped Markov decision processes with multiple constraints. Math. Meth. Operat. Res. 54, 455–469.
  • Luenberger, D. G. and Ye, Y. (2010). Linear and Nonlinear Programming (Internat. Ser. Operat. Res. Manag. Sci. 116), 3rd edn. Springer, New York.
  • Piunovskiy, A. B. (1997). Optimal Control of Random Sequences in Problems with Constraints (Math. Appl. 410). Kluwer, Dordrecht.
  • Puterman, M. L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley, New York.
  • Rockafellar, R. T. (1970). Convex Analysis (Princeton Math. Ser. 28). Princeton University Press.
  • Schäl, M. (1975). On dynamic programming: compactness of the space of policies. Stoch. Process. Appl. 3, 345–364.