June 2015 First passage optimality and variance minimisation of Markov decision processes with varying discount factors
Xiao Wu, Xianping Guo
Author Affiliations +
J. Appl. Probab. 52(2): 441-456 (June 2015). DOI: 10.1239/jap/1437658608


This paper deals with the first passage optimality and variance minimisation problems of discrete-time Markov decision processes (MDPs) with varying discount factors and unbounded rewards/costs. First, under suitable conditions slightly weaker than those in the previous literature on the standard (infinite horizon) discounted MDPs, we establish the existence and characterisation of the first passage expected-optimal stationary policies. Second, to further distinguish the expected-optimal stationary policies, we introduce the variance minimisation problem, prove that it is equivalent to a new first passage optimality problem of MDPs, and, thus, show the existence of a variance-optimal policy that minimises the variance over the set of all first passage expected-optimal stationary policies. Finally, we use a computable example to illustrate our main results and also to show the difference between the first passage optimality here and the standard discount optimality of MDPs in the previous literature.


Download Citation

Xiao Wu. Xianping Guo. "First passage optimality and variance minimisation of Markov decision processes with varying discount factors." J. Appl. Probab. 52 (2) 441 - 456, June 2015. https://doi.org/10.1239/jap/1437658608


Published: June 2015
First available in Project Euclid: 23 July 2015

zbMATH: 1327.90374
MathSciNet: MR3372085
Digital Object Identifier: 10.1239/jap/1437658608

Primary: 90C40
Secondary: 60J27 , 93E20

Keywords: Discrete-time Markov decision process , first passage optimality , unbounded reward , variance minimisation , varying discount factor

Rights: Copyright © 2015 Applied Probability Trust


This article is only available to subscribers.
It is not available for individual sale.

Vol.52 • No. 2 • June 2015
Back to Top