The Annals of Statistics

Aggregation of predictors for nonstationary sub-linear processes and online adaptive forecasting of time varying autoregressive processes

Christophe Giraud, François Roueff, and Andres Sanchez-Perez

Full-text: Open access


In this work, we study the problem of aggregating a finite number of predictors for nonstationary sub-linear processes. We provide oracle inequalities relying essentially on three ingredients: (1) a uniform bound of the $\ell^{1}$ norm of the time varying sub-linear coefficients, (2) a Lipschitz assumption on the predictors and (3) moment conditions on the noise appearing in the linear representation. Two kinds of aggregations are considered giving rise to different moment conditions on the noise and more or less sharp oracle inequalities. We apply this approach for deriving an adaptive predictor for locally stationary time varying autoregressive (TVAR) processes. It is obtained by aggregating a finite number of well chosen predictors, each of them enjoying an optimal minimax convergence rate under specific smoothness conditions on the TVAR coefficients. We show that the obtained aggregated predictor achieves a minimax rate while adapting to the unknown smoothness. To prove this result, a lower bound is established for the minimax rate of the prediction risk for the TVAR process. Numerical experiments complete this study. An important feature of this approach is that the aggregated predictor can be computed recursively and is thus applicable in an online prediction context.

Article information

Ann. Statist., Volume 43, Number 6 (2015), 2412-2450.

Received: May 2014
Revised: May 2015
First available in Project Euclid: 7 October 2015

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62M20: Prediction [See also 60G25]; filtering [See also 60G35, 93E10, 93E11] 62G99: None of the above, but in this section 62M10: Time series, auto-correlation, regression, etc. [See also 91B84] 68W27: Online algorithms

Nonstationary time series exponential weighted aggregation online learning time varying autoregressive processes adaptive prediction


Giraud, Christophe; Roueff, François; Sanchez-Perez, Andres. Aggregation of predictors for nonstationary sub-linear processes and online adaptive forecasting of time varying autoregressive processes. Ann. Statist. 43 (2015), no. 6, 2412--2450. doi:10.1214/15-AOS1345.

Export citation


  • Alquier, P. and Wintenberger, O. (2012). Model selection for weakly dependent time series forecasting. Bernoulli 18 883–913.
  • Anava, O., Hazan, E., Mannor, S. and Shamir, O. (2013). Online learning for time series prediction. Preprint. Available at arXiv:1302.6927.
  • Arkoun, O. (2011). Sequential adaptive estimators in nonparametric autoregressive models. Sequential Anal. 30 229–247.
  • Audibert, J.-Y. (2009). Fast learning rates in statistical inference through aggregation. Ann. Statist. 37 1591–1646.
  • Brockwell, P. J. and Davis, R. A. (2006). Time Series: Theory and Methods. Springer, New York. Reprint of the second (1991) edition.
  • Catoni, O. (1997). A mixture approach to universal model selection. Technical report, École Normale Supérieure.
  • Catoni, O. (2004). Statistical Learning Theory and Stochastic Optimization. Lecture Notes in Math. 1851. Springer, Berlin.
  • Cesa-Bianchi, N. and Lugosi, G. (2006). Prediction, Learning, and Games. Cambridge Univ. Press, Cambridge.
  • Dahlhaus, R. (1996). On the Kullback–Leibler information divergence of locally stationary processes. Stochastic Process. Appl. 62 139–168.
  • Dahlhaus, R. (2009). Local inference for locally stationary time series based on the empirical spectral measure. J. Econometrics 151 101–112.
  • Dahlhaus, R. and Polonik, W. (2006). Nonparametric quasi-maximum likelihood estimation for Gaussian locally stationary processes. Ann. Statist. 34 2790–2824.
  • Dahlhaus, R. and Polonik, W. (2009). Empirical spectral processes for locally stationary time series. Bernoulli 15 1–39.
  • Dalalyan, A. S. and Tsybakov, A. B. (2008). Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity. Mach. Learn. 72 39–61.
  • Doukhan, P. and Wintenberger, O. (2008). Weakly dependent chains with infinite memory. Stochastic Process. Appl. 118 1997–2013.
  • Gerchinovitz, S. (2011). Prediction of individual sequences and prediction in the statistical framework: Some links around sparse regression and aggregation techniques. Ph.D. thesis, Univ. Paris Sud-Paris XI.
  • Giraud, C., Roueff, F. and Sanchez-Perez, A. (2015). Supplement to “Aggregation of predictors for non stationary sub-linear processes and online adaptive forecasting of time varying autoregressive processes.” DOI:10.1214/15-AOS1345SUPP.
  • Grenier, Y. (1983). Time-dependent ARMA modeling of nonstationary signals. IEEE Transactions on ASSP 31 899–911.
  • Juditsky, A. and Nemirovski, A. (2000). Functional aggregation for nonparametric regression. Ann. Statist. 28 681–712.
  • Künsch, H. R. (1995). A note on causal solutions for locally stationary AR-processes. Unpublished preprint, ETH Zürich.
  • Lepskiĭ, O. V. (1990). A problem of adaptive estimation in Gaussian white noise. Teor. Veroyatn. Primen. 35 459–470.
  • Leung, G. and Barron, A. R. (2006). Information theory and mixing least-squares regressions. IEEE Trans. Inform. Theory 52 3396–3410.
  • Massart, P. (2007). Concentration Inequalities and Model Selection. Lecture Notes in Math. 1896. Springer, Berlin.
  • Moulines, E., Priouret, P. and Roueff, F. (2005). On recursive estimation for time varying autoregressive processes. Ann. Statist. 33 2610–2654.
  • Rigollet, P. and Tsybakov, A. B. (2012). Sparse estimation by exponential weighting. Statist. Sci. 27 558–575.
  • Sancetta, A. (2010). Recursive forecast combination for dependent heterogeneous data. Econometric Theory 26 598–631.
  • Stoltz, G. (2011). Contributions to the sequential prediction of arbitrary sequences: Applications to the theory of repeated games and empirical studies of the performance of the aggregation of experts. Habilitation à diriger des recherches, Univ. Paris Sud-Paris XI.
  • Tong, H. and Lim, K. S. (1980). Threshold autoregression, limit cycles and cyclical data. J. Roy. Statist. Soc. Ser. B 42 245–292.
  • Tsybakov, A. B. (2003). Optimal rates of aggregation. In Learning Theory and Kernel Machines (B. Schölkopf and M. K. Warmuth, eds.). Lecture Notes in Computer Science 2777 303–313. Springer, Berlin.
  • Tsybakov, A. B. (2009). Introduction to Nonparametric Estimation. Springer, New York.
  • Vovk, V. G. (1990). Aggregating strategies. In Proc. Third Workshop on Computational Learning Theory 371–383. Morgan Kaufmann, San Mateo, CA.
  • Wang, Z., Paterlini, S., Gao, F. and Yang, Y. (2014). Adaptive minimax regression estimation over sparse $\ell_{q}$-hulls. J. Mach. Learn. Res. 15 1675–1711.
  • Yang, Y. (2000a). Combining different procedures for adaptive regression. J. Multivariate Anal. 74 135–161.
  • Yang, Y. (2000b). Mixing strategies for density estimation. Ann. Statist. 28 75–87.
  • Yang, Y. (2004). Combining forecasting procedures: Some theoretical results. Econometric Theory 20 176–222.

Supplemental materials

  • Supplementary material for: Aggregation of predictors for nonstationary sub-linear processes and online adaptive forecasting of time varying autoregressive processes. We explain how to build nonadaptive minimax predictors which can be used in the aggregation step. The document also contains some technical proofs and provides additional results with improved aggregation rates.