Abstract
By a decision process is meant a pair $(X, \Gamma)$, where $X$ is an arbitrary set (the state space), and $\Gamma$ associates to each point $x$ in $X$ an arbitrary nonempty collection of discrete probability measures (actions) on $X$. In a decision process with nonnegative costs depending on the current state, the action taken, and the following state, there is always available a Markov strategy which uniformly (nearly) minimizes the expected total cost. If the costs are strictly positive and depend only on the current state, there is even a stationary strategy with the same property. In a decision process with a fixed goal $g$ in $X$, there is always a stationary strategy which uniformly (nearly) minimizes the expected time to the goal, and, if $X$ is countable, such a stationary strategy exists which also (nearly) maximizes the probability of reaching the goal.
Citation
Stephen Demko. Theodore P. Hill. "Decision Processes with Total-Cost Criteria." Ann. Probab. 9 (2) 293 - 301, April, 1981. https://doi.org/10.1214/aop/1176994470
Information