The Annals of Statistics
- Ann. Statist.
- Volume 2, Number 1 (1974), 159-169.
On Maximal Rewards and $|varepsilon$-Optimal Policies in Continuous Time Markov Decision Chains
For continuous time Markov decision chains of finite duration, we show that the vector of maximal total rewards, less a linear average-return term, converges as the duration $t \rightarrow \infty$. We then show that there are policies which are both simultaneously $\varepsilon$-optimal for all durations $t$ and are stationary except possibly for a final, finite segment. Further, the length of this final segment depends on $\varepsilon$, but not on $t$ for large enough $t$, while the initial stationary part of the policy is independent of both $\varepsilon$ and $t$.
Ann. Statist., Volume 2, Number 1 (1974), 159-169.
First available in Project Euclid: 12 April 2007
Permanent link to this document
Digital Object Identifier
Mathematical Reviews number (MathSciNet)
Zentralblatt MATH identifier
Lembersky, Mark R. On Maximal Rewards and $|varepsilon$-Optimal Policies in Continuous Time Markov Decision Chains. Ann. Statist. 2 (1974), no. 1, 159--169. doi:10.1214/aos/1176342621. https://projecteuclid.org/euclid.aos/1176342621