Abstract
The paper deals with continuous time Markov decision processes on a fairly general state space. The rewards are continuously discounted at rate $\alpha > 0$. A set of conditions is shown to be necessary and sufficient for a policy to be optimal. For the special case of time independent reward function and under the assumption that the action space is finite a policy improvement algorithm is proposed and its convergence to an optimal policy is proved.
Citation
Bharat T. Doshi. "Continuous Time Control of Markov Processes on an Arbitrary State Space: Discounted Rewards." Ann. Statist. 4 (6) 1219 - 1235, November, 1976. https://doi.org/10.1214/aos/1176343653
Information