## The Annals of Statistics

### On Maximal Rewards and $|varepsilon$-Optimal Policies in Continuous Time Markov Decision Chains

Mark R. Lembersky

#### Abstract

For continuous time Markov decision chains of finite duration, we show that the vector of maximal total rewards, less a linear average-return term, converges as the duration $t \rightarrow \infty$. We then show that there are policies which are both simultaneously $\varepsilon$-optimal for all durations $t$ and are stationary except possibly for a final, finite segment. Further, the length of this final segment depends on $\varepsilon$, but not on $t$ for large enough $t$, while the initial stationary part of the policy is independent of both $\varepsilon$ and $t$.

#### Article information

Source
Ann. Statist., Volume 2, Number 1 (1974), 159-169.

Dates
First available in Project Euclid: 12 April 2007

https://projecteuclid.org/euclid.aos/1176342621

Digital Object Identifier
doi:10.1214/aos/1176342621

Mathematical Reviews number (MathSciNet)
MR349239

Zentralblatt MATH identifier
0272.90083

JSTOR
Lembersky, Mark R. On Maximal Rewards and $|varepsilon$-Optimal Policies in Continuous Time Markov Decision Chains. Ann. Statist. 2 (1974), no. 1, 159--169. doi:10.1214/aos/1176342621. https://projecteuclid.org/euclid.aos/1176342621