The Annals of Statistics

Optimal strategies for a class of sequential control problems with precedence relations

Hock Peng Chan, Cheng-Der Fuh, and Inchi Hu

Full-text: Open access


Consider the following multi-phase project management problem. Each project is divided into several phases. All projects enter the next phase at the same point chosen by the decision maker based on observations up to that point. Within each phase, one can pursue the projects in any order. When pursuing the project with one unit of resource, the project state changes according to a Markov chain. The probability distribution of the Markov chain is known up to an unknown parameter. When pursued, the project generates a random reward depending on the phase and the state of the project and the unknown parameter. The decision maker faces two problems: (a) how to allocate resources to projects within each phase, and (b) when to enter the next phase, so that the total expected reward is as large as possible. In this paper we formulate the preceding problem as a stochastic scheduling problem and propose asymptotic optimal strategies, which minimize the shortfall from perfect information payoff. Concrete examples are given to illustrate our method.

Article information

Ann. Statist., Volume 35, Number 4 (2007), 1722-1748.

First available in Project Euclid: 29 August 2007

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62L05: Sequential design
Secondary: 62N99: None of the above, but in this section

Markov chains multi-armed bandits Kullback–Leibler number likelihood ratio optimal stopping scheduling single-machine job sequencing Wald’s equation


Chan, Hock Peng; Fuh, Cheng-Der; Hu, Inchi. Optimal strategies for a class of sequential control problems with precedence relations. Ann. Statist. 35 (2007), no. 4, 1722--1748. doi:10.1214/009053606000001569.

Export citation


  • Agrawal, R., Teneketzis, D. and Anantharam, V. (1989). Asymptotically efficient adaptive allocation schemes for controlled i.i.d. processes: Finite parameter space. IEEE Trans. Automat. Control 34 258–267.
  • Agrawal, R., Teneketzis, D. and Anantharam, V. (1989). Asymptotically efficient adaptive allocation schemes for controlled Markov chains: Finite parameter space. IEEE Trans. Automat. Control 34 1249–1259.
  • Anantharam, V., Varaiya, P. and Walrand, J. (1987). Asymptotically efficient allocation rules for the multi-armed bandit problem with multiple plays. I. I.I.D. rewards. II. Markovian rewards. IEEE Trans. Automat. Control 32 968–976, 977–982.,
  • Berry, D. A. and Fristedt, B. (1985). Bandit Problems. Sequentral Allocation of Experiments. Chapman and Hall, London.
  • Chan, H. P., Fuh, C. D. and Hu, I. (2006). Multi-armed bandit problem with precendence relations. In Times Series and Related Topics. In Memory of Ching-Zong Wei (H.-C. Ho, C.-K. Ing and T. L. Lai, eds.) 223–235. IMS, Beachwood, OH.
  • Feldman, D. (1962). Contributions to the “two-armed bandit” problem. Ann. Math. Statist. 33 847–856.
  • Fuh, C.-D. and Hu, I. (2000). Asymptotically efficient strategies for a stochastic scheduling problem with order constraints. Ann. Statist. 28 1670–1695.
  • Gittins, J. C. (1989). Multi-Armed Bandit Allocation Indices. Wiley, Chichester.
  • Glazebrook, K. D. (1991). Strategy evaluation for stochastic scheduling problems with order constraints. Adv. in Appl. Probab. 23 86–104.
  • Glazebrook, K. D. (1996). On the undiscounted tax problem with precedence constraints. Adv. in Appl. Probab. 28 1123–1144.
  • Graves, T. and Lai, T. L. (1997). Asymptotically efficient adaptive choice of control laws in controlled Markov chains. SIAM J. Control Optim. 35 715–743.
  • Hu, I. and Lee, C.-W. J. (2003). Bayesian adaptive stochastic process termination. Math. Oper. Res. 28 361–381.
  • Hu, I. and Wei, C. Z. (1989). Irreversible adaptive allocation rules. Ann. Statist. 17 801–823.
  • Kadane, J. B. and Simon, H. A. (1977). Optimal strategies for a class of constrained sequential problems. Ann. Statist. 5 237–255.
  • Lai., T. L. (1987). Adaptive treatment allocation and the multi-armed bandit problem. Ann. Statist. 15 1091–1114.
  • Lai, T. L. and Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Adv. in Appl. Math. 6 4–22.
  • Mandelbaum, A. and Vanderbei, R. J. (1981). Optimal stopping and supermartingales over partially ordered sets. Z. Wachrsch. Verw. Gebiete 57 253–264.
  • Meyn, S. P. and Tweedie, R. L. (1993). Markov Chains and Stochastic Stability. Springer, London.
  • Ney, P. and Nummelin, E. (1987). Markov additive processes. I. Eigenvalue properties and limit theorems. Ann. Probab. 15 561–592.
  • Presman, È. L. and Sonin, I. N. (1990). Sequential Control with Incomplete Information. Academic Press, San Diego.
  • Robbins, H. (1952). Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58 527–535.