Bernoulli

  • Bernoulli
  • Volume 11, Number 6 (2005), 1009-1029.

Zero-sum continuous-time Markov games with unbounded transition and discounted payoff rates

Xianping Guo and Onésimo Hernández-Lerma

Full-text: Open access

Abstract

This paper is concerned with two-person zero-sum games for continuous-time Markov chains, with possibly unbounded payoff and transition rate functions, under the discounted payoff criterion. We give conditions under which the existence of the value of the game and a pair of optimal stationary strategies is ensured by using the optimality (or Shapley) equation. We prove the convergence of the value iteration scheme to the game's value and to a pair of optimal stationary strategies. Moreover, when the transition rates are bounded we further show that the convergence of value iteration is exponential. Our results are illustrated with a controlled queueing system with unbounded transition and reward rates.

Article information

Source
Bernoulli, Volume 11, Number 6 (2005), 1009-1029.

Dates
First available in Project Euclid: 16 January 2006

Permanent link to this document
https://projecteuclid.org/euclid.bj/1137421638

Digital Object Identifier
doi:10.3150/bj/1137421638

Mathematical Reviews number (MathSciNet)
MR2188839

Zentralblatt MATH identifier
1125.91016

Keywords
controlled Q-process discounted payoffs value of the game zero-sum Markov games

Citation

Guo, Xianping; Hernández-Lerma, Onésimo. Zero-sum continuous-time Markov games with unbounded transition and discounted payoff rates. Bernoulli 11 (2005), no. 6, 1009--1029. doi:10.3150/bj/1137421638. https://projecteuclid.org/euclid.bj/1137421638


Export citation

References

  • [1] Altman, E. (2005) Applications of dynamic games in queues. In A.S. Nowak and K. Szajowski (eds), Advances in Dynamic Games. Boston: Birkhauser.
  • [2] Anderson, W.J. (1991) Continuous-Time Markov Chains. New York: Springer-Verlag.
  • [3] Ardanuy, R. and Alcalá, A. (1992) Weak infinitesimal operators and stochastic differential games. Stochastica, 13, 5-12.
  • [4] Basar, T. and Olsder, G.J. (1999) Dynamic Noncooperative Game Theory, 2nd edition. Philadelphia: Society for Industrial and Applied Mathematics.
  • [5] Chung, K.L. (1960) Markov Chains with Stationary Transition Probabilities. Berlin: Springer-Verlag.
  • [6] Fan, K. (1953) Minimax theorems. Proc. Natl. Acad. Sci. USA, 39, 42-47.
  • [7] Fainberg, E.A. (2004) Continuous-time discounted jump Markov decision processes: a discrete-event approach. Math. Oper. Res., 29, 492-524.
  • [8] Feller, W. (1940) On the integro-differential equations of purely discontinuous Markoff processes. Trans. Amer. Math. Soc., 48, 488-515.
  • [9] Filar, J.A. and Vrieze, K. (1997) Competitive Markov Decision Processes. New York: Springer-Verlag.
  • [10] Guo, X.P. and Hernández-Lerma, O. (2003a) Continuous-time controlled Markov chains. Ann. Appl. Probab., 13, 363-388.
  • [11] Guo, X.P. and Hernández-Lerma, O. (2003b) Drift and monotonicity conditions for continuous-time Markov control processes with an average criterion. IEEE Trans. Automat. Control, 48, 236-245.
  • [12] Guo, X.P. and Liu, K. (2001) A note on optimality conditions for continuous-time Markov decision processes. IEEE Trans. Automat. Control, 146, 1984-1989.
  • [13] Guo, X.P. and Zhu, W.P. (2002) Denumerable state continuous time Markov decision processes with unbounded transition and reward rates under discounted criterion. J. Appl. Probab., 39, 233-250.
  • [14] Hamadène, S. (1999) Nonzero sum linear-quadratic stochastic differential games and backwardforward equations. Stochastic Anal. Appl., 17, 117-130.
  • [15] Hernández-Lerma, O. (1994) Lectures on Continuous-Time Markov Control Processes. Mexico City: Sociedad Matemática Mexicana.
  • [16] Hernández-Lerma, O. and Lasserre, J.B. (1999) Further Topics on Discrete-Time Markov Control Processes. New York: Springer-Verlag.
  • [17] Hernández-Lerma, O. and Lasserre, J.B. (2001) Zero-sum stochastic games in Borel spaces: average payoff criterion. SIAM J. Control Optim., 39, 1520-1539.
  • [18] Hou, Z.T. (1994) The Q-Matrix Problems on Markov Processes. Changsha: Science and Technology Press of Hunan. (In Chinese.)
  • [19] Hou, Z.T. and Guo, X.P. (1998) Markov Decision Processes. Changsha: Science and Technology Press of Hunan. (In Chinese.)
  • [20] Lai, H.C. and Tanaka, K. (1984) On an N-person noncooperative Markov game with a metric state space. J. Math. Anal. Appl., 101, 78-96.
  • [21] Lal, A.K. and Sinha, S. (1992) Zero-sum two-person semi-Markov games. J. Appl. Probab., 29, 56-72.
  • [22] Lippman, S.A. (1973) Semi-Markov decision processes with unbounded rewards. Management Sci., 19, 717-731.
  • [23] Lippman, S.A. (1975) On dynamic programming with unbounded rewards. Management Sci., 21, 1225-1233.
  • [24] Puterman, M.L. (1994) Markov Decision Processes. New York: Wiley.
  • [25] Ramachandran, K.M. (1999) A convergence method for stochastic differential games with a small parameter. Stochastic Anal. Appl., 17, 219-252.
  • [26] Sennott, L.I. (1994) Zero-sum stochastic games with unbounded cost: discounted and average cost cases. Z. Oper. Res., 39, 209-225.
  • [27] Sennott, L.I. (1999) Stochastic Dynamic Programming and the Control of Queueing Systems. New York: Wiley.
  • [28] Shapley, L. (1953) Stochastic games. Proc. Natl. Acad. Sci. USA, 39, 1095-1100.
  • [29] Tanaka, K. and Homma, H. (1978) Continuous time non-cooperative n-person Markov games. Bull. Math. Statist., 15, 93-105.
  • [30] Tanaka, K. and Wakuta, K. (1978) On continuous time Markov games with countable state space. J. Oper. Res. Soc. Japan, 21, 17-27.
  • [31] Tijms, H.C. (1994) Stochastic Models: An Algorithmic Approach. Chichester: Wiley.
  • [32] Van Nunen, J.A.E.E. and Wessels, J. (1978) A note on dynamic programming with unbounded rewards. Management Sci., 24, 576-580.
  • [33] Yushkevich, A.A. and Fainberg, E.A. (1979) On homogeneous Markov models with continuous time and finite or countable state space. Theory Probab. Appl., 24, 156-161.