## The Annals of Statistics

### Asymptotically efficient strategies for a stochastic scheduling problem with order constraints

#### Abstract

Motivated by an application in computerized adaptive tests, we consider the following sequential design problem.There are $J$ jobs to be processed according to a predetermined order. A single machine is available to process these $J$ jobs. Each job under processing evolves stochastically as a Markov chain and earns rewards as it is processed, not otherwise. The Markov chain has transition probabilities parameterized by an unknown parameter $\theta$. The objective is to determine how long each job should be processed so that the total expected rewards over an extended time interval is maximized. We define the regret associated with a strategy as the shortfall from the maximum expected reward under complete information on $\theta$. Therefore the problem is equivalent to minimizing the regret. The asymptotic lower bound for the regret associated with any uniformly good strategy is characterized by a deterministic constraint minimization problem. In ignorance of the parameter value, we construct a class of efficient strategies, which achieve the lower bound, based on the theory of sequential testing.

#### Article information

Source
Ann. Statist., Volume 28, Number 6 (2000), 1670-1695.

Dates
First available in Project Euclid: 12 March 2002

https://projecteuclid.org/euclid.aos/1015957475

Digital Object Identifier
doi:10.1214/aos/1015957475

Mathematical Reviews number (MathSciNet)
MR1835036

Zentralblatt MATH identifier
1105.62365

Subjects
Primary: 62L05: Sequential design
Secondary: 62N99: None of the above, but in this section

#### Citation

Fuh, Cheng-Der; Hu, Inchi. Asymptotically efficient strategies for a stochastic scheduling problem with order constraints. Ann. Statist. 28 (2000), no. 6, 1670--1695. doi:10.1214/aos/1015957475. https://projecteuclid.org/euclid.aos/1015957475

#### References

• Agrawal, R., Teneketzis, D. and Anantharam, V. (1989). Asymptotically efficient adaptive allocation schemes for controlled i.i.d. processes: finite parameter space. IEEE Trans. Auto. Control 35 258-267.
• Anantharam, V., Varaiya, P. and Walrand, J. (1987). Asymptotically efficient allocation rules for the multi-armed bandit problem with multiple playsI. IID rewards; II. Markov rewards. IEEE Trans. Auto. Control 33 968-982.
• Berry, D. A. and Fristedt, B. (1985). Bandit Problems. Chapman and Hall, London. Chang, H. H. and Ying, Z. L. (1999a). A-stratified multistage computerized adaptive testing. Applied Psychological Measurement 26 211-222. Chang, H. H. and Ying, Z. L. (1999b). Nonlinear sequential designs for logistic item response theory modelswith applicationsto computerized adaptive tests. Ann. Statist.
• Dixon, W. J. and Mood, A. M. (1948). A method for obtaining and analyzing sensitivity data. J. Amer. Statist. Assoc. 43 109-126.
• Duffin, R. J. Petersen, E. L. and Zener, C. (1967). Geometric Programming. Wiley, New York.
• Fuh, C. D. and Lai, T. L. (1998). Wald's equations, first passage times and moments of ladder variablesin Markov random walks. J. Appl. Probab. 35 566-580.
• Fuh, C. D. and Zhang, C. H. (2000). Poisson equation, moment inequalities and r-quick convergence for Markov random walks. Stochastic Process. Appl. 87 53-67.
• Gittins, (1989). Multi-armed Bandit Allocation Indices. Wiley, New York.
• Glazebrook, K. D. (1991). Strategy evaluation for stochastic scheduling problems with order constraints. Adv. Appl. Probab. 23 86-104.
• Glazebrook, K. D. (1996). On the undiscounted tax problem with precedence constraints. Adv. Appl. Probab. 28 1123-1144.
• Hu, I. and Wei, C. Z. (1989). Irreversible adaptive allocation rules. Ann. Statist. 17 801-823.
• Lai, T. L. and Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Adv. in Appl. Math. 6 4-22.
• Lord, F. M. (1980). Applications of Item Response Theory to Practical Testing Problems. Erlbaum, Hillsdale, NJ.
• Meyn, S. P. and Tweedie, R. L. (1993). Markov Chains and Stochastic Stability. Springer, New York.
• Ney, P. and Nummelin, E. (1987). Markov additive processes I. Eigenvalue properties and limit theorems. Ann. Probab. 15 561-592.
• Presman, E. L. and Sonin, I. N. (1990). Sequential Control with Incomplete Information. Academic Press, San Diego.
• Robbins, H. (1952). Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58 1397-1409.
• Sadowsky, J. S. (1989). A dependent data extension of Wald'sidentity and itsapplication to sequential test performance computation. IEEE Trans. Inform. Theory 35 834-842.
• Stocking, M. L. and Lewis, C. (1995). A new method of controlling item exposure in computerized adaptive testing. Research report 95-25, Educational Testing Service, Princeton, NJ.
• Sympson, J. B. and Hetter, R. D. (1985). Controlling item-exposure rates in computerized adaptive testing. In Proceeding of the 27th Annual Meeting of the Military Testing Association 973-977. Navy Personal Research and Development Center, San Diego, CA.
• Woodroofe, M. (1982). Nonlinear Renewal Theory in Sequential Analysis. SIAM, Philadelphia.