The Annals of Statistics

On Maximal Rewards and $|varepsilon$-Optimal Policies in Continuous Time Markov Decision Chains

Mark R. Lembersky

Full-text: Open access

Abstract

For continuous time Markov decision chains of finite duration, we show that the vector of maximal total rewards, less a linear average-return term, converges as the duration $t \rightarrow \infty$. We then show that there are policies which are both simultaneously $\varepsilon$-optimal for all durations $t$ and are stationary except possibly for a final, finite segment. Further, the length of this final segment depends on $\varepsilon$, but not on $t$ for large enough $t$, while the initial stationary part of the policy is independent of both $\varepsilon$ and $t$.

Article information

Source
Ann. Statist., Volume 2, Number 1 (1974), 159-169.

Dates
First available in Project Euclid: 12 April 2007

Permanent link to this document
https://projecteuclid.org/euclid.aos/1176342621

Digital Object Identifier
doi:10.1214/aos/1176342621

Mathematical Reviews number (MathSciNet)
MR349239

Zentralblatt MATH identifier
0272.90083

JSTOR
links.jstor.org

Subjects
Primary: 90C40: Markov and semi-Markov decision processes
Secondary: 90B99: None of the above, but in this section 93E20: Optimal stochastic control

Keywords
Markov decision chains maximal rewards $\varepsilon$-optimal policies initially stationary policies dynamic programming

Citation

Lembersky, Mark R. On Maximal Rewards and $|varepsilon$-Optimal Policies in Continuous Time Markov Decision Chains. Ann. Statist. 2 (1974), no. 1, 159--169. doi:10.1214/aos/1176342621. https://projecteuclid.org/euclid.aos/1176342621


Export citation