The Annals of Statistics
- Ann. Statist.
- Volume 4, Number 5 (1976), 936-953.
Conditions for the Equivalence of Optimality Criteria in Dynamic Programming
This paper examines the relationships between optimality criteria which are commonly used for undiscounted, discrete-time, countable state Markovian decision models. One approach, due to Blackwell, is to maximize the expected discounted total return as the discount factor approaches 1. Another, due to Veinott, is to maximize the Cesaro means of the finite horizon expected returns as the horizon tends to infinity. Derman's is to maximize the long-run average gain. Denardo, Miller and Lippman showed that Blackwell's and Veinott's approaches are equivalent for finite state and action spaces. As shown here, that equivalence breaks down when the state space is countable. Also, policies optimal according to Blackwell's or Veinott's approach need not be optimal according to Derman's. On the positive side, fairly weak conditions are given under which Blackwell's and Veinott's criteria imply Derman's, and somewhat stronger conditions under which Blackwell's and Veinott's criteria are equivalent.
Ann. Statist., Volume 4, Number 5 (1976), 936-953.
First available in Project Euclid: 12 April 2007
Permanent link to this document
Digital Object Identifier
Mathematical Reviews number (MathSciNet)
Zentralblatt MATH identifier
Secondary: 62L99: None of the above, but in this section 90C40: Markov and semi-Markov decision processes 93C55: Discrete-time systems 60J10: Markov chains (discrete-time Markov processes on discrete state spaces) 60J20: Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.) [See also 90B30, 91D10, 91D35, 91E40]
Flynn, James. Conditions for the Equivalence of Optimality Criteria in Dynamic Programming. Ann. Statist. 4 (1976), no. 5, 936--953. doi:10.1214/aos/1176343590. https://projecteuclid.org/euclid.aos/1176343590