The Annals of Applied Probability

Convergence rate of linear two-time-scale stochastic approximation

Vijay R. Konda and John N. Tsitsiklis

Full-text: Open access

Abstract

We study the rate of convergence of linear two-time-scale stochastic approximation methods. We consider two-time-scale linear iterations driven by i.i.d. noise, prove some results on their asymptotic covariance and establish asymptotic normality. The well-known result [Polyak, B. T. (1990). Automat. Remote Contr. 51 937–946; Ruppert, D. (1988). Technical Report 781, Cornell Univ. ] on the optimality of Polyak–Ruppert averaging techniques specialized to linear stochastic approximation is established as a consequence of the general results in this paper.

Article information

Source
Ann. Appl. Probab., Volume 14, Number 2 (2004), 796-819.

Dates
First available in Project Euclid: 23 April 2004

Permanent link to this document
https://projecteuclid.org/euclid.aoap/1082737112

Digital Object Identifier
doi:10.1214/105051604000000116

Mathematical Reviews number (MathSciNet)
MR2052903

Zentralblatt MATH identifier
1094.62103

Subjects
Primary: 62L20: Stochastic approximation

Keywords
Stochastic approximation two-time-scales

Citation

Konda, Vijay R.; Tsitsiklis, John N. Convergence rate of linear two-time-scale stochastic approximation. Ann. Appl. Probab. 14 (2004), no. 2, 796--819. doi:10.1214/105051604000000116. https://projecteuclid.org/euclid.aoap/1082737112


Export citation

References

  • Baras, J. S. and Borkar, V. S. (2000). A learning algorithm for Markov decision processes with adaptive state aggregation. In Proc. 39th IEEE Conference on Decision and Control. IEEE, New York.
  • Benveniste, A., Metivier, M. and Priouret, P. (1990). Adaptive Algorithms and Stochastic Approximations. Springer, Berlin.
  • Bhatnagar, S., Fu, M. C., Marcus, S. I. and Bhatnagar, S. (2001). Two timescale algorithms for simulation optimization of hidden Markov models. IIE Transactions 3 245--258.
  • Bhatnagar, S., Fu, M. C., Marcus, S. I. and Fard, P. J. (2001). Optimal structured feedback policies for ABR flow control using two timescale SPSA. IEEE/ACM Transactions on Networking 9 479--491.
  • Borkar, V. S. (1997). Stochastic approximation with two time scales. Systems Control Lett. 29 291--294.
  • Duflo, M. (1997). Random Iterative Models. Springer, Berlin.
  • Kokotovic, P. V. (1984). Applications of singular perturbation techniques to control problems. SIAM Rev. 26 501--550.
  • Konda, V. R. (2002). Actor-critic algorithms. Ph.D. dissertation, Dept. Electrical Engineering and Computer Science, MIT.
  • Konda, V. R. and Borkar, V. S. (1999). Actor-critic like learning algorithms for Markov decision processes. SIAM J. Control Optim. 38 94--123.
  • Konda, V. R. and Tsitsiklis, J. N. (2003). On actor-critic algorithms. SIAM J. Control Optim. 42 1143--1166.
  • Kushner, H. J. and Clark, D. S. (1978). Stochastic Approximation for Constrained and Unconstrained Systems. Springer, New York.
  • Kushner, H. J. and Yang, J. (1993). Stochastic approximation with averaging of the iterates: Optimal asymptotic rates of convergence for general processes. SIAM J. Control Optim. 31 1045--1062.
  • Kushner, H. J. and Yin, G. G. (1997). Stochastic Approximation Algorithms and Applications. Springer, New York.
  • Nevel'son, M. B. and Has'minskii, R. Z. (1973). Stochastic Approximation and Recursive Estimation. Amer. Math. Soc., Providence, RI.
  • Polyak, B. T. (1976). Convergence and convergence rate of iterative stochastic algorithms I. Automat. Remote Control 12 1858--1868.
  • Polyak, B. T. (1990). New method of stochastic approximation type. Automat. Remote Control 51 937--946.
  • Polyak, B. T. and Juditsky, A. B. (1992). Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30 838--855.
  • Ruppert, D. (1988). Efficient estimators from a slowly convergent Robbins--Monro procedure. Technical Report 781, School of Operations Research and Industrial Engineering, Cornell Univ.