Open Access
2009 Policy Iteration for Continuous-Time Average Reward Markov Decision Processes in Polish Spaces
Quanxin Zhu, Xinsong Yang, Chuangxia Huang
Abstr. Appl. Anal. 2009: 1-17 (2009). DOI: 10.1155/2009/103723

Abstract

We study the policy iteration algorithm (PIA) for continuous-time jump Markov decision processes in general state and action spaces. The corresponding transition rates are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. The criterion that we are concerned with is expected average reward. We propose a set of conditions under which we first establish the average reward optimality equation and present the PIA. Then under two slightly different sets of conditions we show that the PIA yields the optimal (maximum) reward, an average optimal stationary policy, and a solution to the average reward optimality equation.

Citation

Download Citation

Quanxin Zhu. Xinsong Yang. Chuangxia Huang. "Policy Iteration for Continuous-Time Average Reward Markov Decision Processes in Polish Spaces." Abstr. Appl. Anal. 2009 1 - 17, 2009. https://doi.org/10.1155/2009/103723

Information

Published: 2009
First available in Project Euclid: 16 March 2010

zbMATH: 1192.90243
MathSciNet: MR2581137
Digital Object Identifier: 10.1155/2009/103723

Rights: Copyright © 2009 Hindawi

Vol.2009 • 2009
Back to Top