## The Annals of Statistics

### Aggregation of predictors for nonstationary sub-linear processes and online adaptive forecasting of time varying autoregressive processes

#### Abstract

In this work, we study the problem of aggregating a finite number of predictors for nonstationary sub-linear processes. We provide oracle inequalities relying essentially on three ingredients: (1) a uniform bound of the $\ell^{1}$ norm of the time varying sub-linear coefficients, (2) a Lipschitz assumption on the predictors and (3) moment conditions on the noise appearing in the linear representation. Two kinds of aggregations are considered giving rise to different moment conditions on the noise and more or less sharp oracle inequalities. We apply this approach for deriving an adaptive predictor for locally stationary time varying autoregressive (TVAR) processes. It is obtained by aggregating a finite number of well chosen predictors, each of them enjoying an optimal minimax convergence rate under specific smoothness conditions on the TVAR coefficients. We show that the obtained aggregated predictor achieves a minimax rate while adapting to the unknown smoothness. To prove this result, a lower bound is established for the minimax rate of the prediction risk for the TVAR process. Numerical experiments complete this study. An important feature of this approach is that the aggregated predictor can be computed recursively and is thus applicable in an online prediction context.

#### Article information

Source
Ann. Statist., Volume 43, Number 6 (2015), 2412-2450.

Dates
Revised: May 2015
First available in Project Euclid: 7 October 2015

https://projecteuclid.org/euclid.aos/1444222080

Digital Object Identifier
doi:10.1214/15-AOS1345

Mathematical Reviews number (MathSciNet)
MR3405599

Zentralblatt MATH identifier
1327.62478

#### Citation

Giraud, Christophe; Roueff, François; Sanchez-Perez, Andres. Aggregation of predictors for nonstationary sub-linear processes and online adaptive forecasting of time varying autoregressive processes. Ann. Statist. 43 (2015), no. 6, 2412--2450. doi:10.1214/15-AOS1345. https://projecteuclid.org/euclid.aos/1444222080

#### References

• Alquier, P. and Wintenberger, O. (2012). Model selection for weakly dependent time series forecasting. Bernoulli 18 883–913.
• Anava, O., Hazan, E., Mannor, S. and Shamir, O. (2013). Online learning for time series prediction. Preprint. Available at arXiv:1302.6927.
• Arkoun, O. (2011). Sequential adaptive estimators in nonparametric autoregressive models. Sequential Anal. 30 229–247.
• Audibert, J.-Y. (2009). Fast learning rates in statistical inference through aggregation. Ann. Statist. 37 1591–1646.
• Brockwell, P. J. and Davis, R. A. (2006). Time Series: Theory and Methods. Springer, New York. Reprint of the second (1991) edition.
• Catoni, O. (1997). A mixture approach to universal model selection. Technical report, École Normale Supérieure.
• Catoni, O. (2004). Statistical Learning Theory and Stochastic Optimization. Lecture Notes in Math. 1851. Springer, Berlin.
• Cesa-Bianchi, N. and Lugosi, G. (2006). Prediction, Learning, and Games. Cambridge Univ. Press, Cambridge.
• Dahlhaus, R. (1996). On the Kullback–Leibler information divergence of locally stationary processes. Stochastic Process. Appl. 62 139–168.
• Dahlhaus, R. (2009). Local inference for locally stationary time series based on the empirical spectral measure. J. Econometrics 151 101–112.
• Dahlhaus, R. and Polonik, W. (2006). Nonparametric quasi-maximum likelihood estimation for Gaussian locally stationary processes. Ann. Statist. 34 2790–2824.
• Dahlhaus, R. and Polonik, W. (2009). Empirical spectral processes for locally stationary time series. Bernoulli 15 1–39.
• Dalalyan, A. S. and Tsybakov, A. B. (2008). Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity. Mach. Learn. 72 39–61.
• Doukhan, P. and Wintenberger, O. (2008). Weakly dependent chains with infinite memory. Stochastic Process. Appl. 118 1997–2013.
• Gerchinovitz, S. (2011). Prediction of individual sequences and prediction in the statistical framework: Some links around sparse regression and aggregation techniques. Ph.D. thesis, Univ. Paris Sud-Paris XI.
• Giraud, C., Roueff, F. and Sanchez-Perez, A. (2015). Supplement to “Aggregation of predictors for non stationary sub-linear processes and online adaptive forecasting of time varying autoregressive processes.” DOI:10.1214/15-AOS1345SUPP.
• Grenier, Y. (1983). Time-dependent ARMA modeling of nonstationary signals. IEEE Transactions on ASSP 31 899–911.
• Juditsky, A. and Nemirovski, A. (2000). Functional aggregation for nonparametric regression. Ann. Statist. 28 681–712.
• Künsch, H. R. (1995). A note on causal solutions for locally stationary AR-processes. Unpublished preprint, ETH Zürich.
• Lepskiĭ, O. V. (1990). A problem of adaptive estimation in Gaussian white noise. Teor. Veroyatn. Primen. 35 459–470.
• Leung, G. and Barron, A. R. (2006). Information theory and mixing least-squares regressions. IEEE Trans. Inform. Theory 52 3396–3410.
• Massart, P. (2007). Concentration Inequalities and Model Selection. Lecture Notes in Math. 1896. Springer, Berlin.
• Moulines, E., Priouret, P. and Roueff, F. (2005). On recursive estimation for time varying autoregressive processes. Ann. Statist. 33 2610–2654.
• Rigollet, P. and Tsybakov, A. B. (2012). Sparse estimation by exponential weighting. Statist. Sci. 27 558–575.
• Sancetta, A. (2010). Recursive forecast combination for dependent heterogeneous data. Econometric Theory 26 598–631.
• Stoltz, G. (2011). Contributions to the sequential prediction of arbitrary sequences: Applications to the theory of repeated games and empirical studies of the performance of the aggregation of experts. Habilitation à diriger des recherches, Univ. Paris Sud-Paris XI.
• Tong, H. and Lim, K. S. (1980). Threshold autoregression, limit cycles and cyclical data. J. Roy. Statist. Soc. Ser. B 42 245–292.
• Tsybakov, A. B. (2003). Optimal rates of aggregation. In Learning Theory and Kernel Machines (B. Schölkopf and M. K. Warmuth, eds.). Lecture Notes in Computer Science 2777 303–313. Springer, Berlin.
• Tsybakov, A. B. (2009). Introduction to Nonparametric Estimation. Springer, New York.
• Vovk, V. G. (1990). Aggregating strategies. In Proc. Third Workshop on Computational Learning Theory 371–383. Morgan Kaufmann, San Mateo, CA.
• Wang, Z., Paterlini, S., Gao, F. and Yang, Y. (2014). Adaptive minimax regression estimation over sparse $\ell_{q}$-hulls. J. Mach. Learn. Res. 15 1675–1711.
• Yang, Y. (2000a). Combining different procedures for adaptive regression. J. Multivariate Anal. 74 135–161.
• Yang, Y. (2000b). Mixing strategies for density estimation. Ann. Statist. 28 75–87.
• Yang, Y. (2004). Combining forecasting procedures: Some theoretical results. Econometric Theory 20 176–222.

#### Supplemental materials

• Supplementary material for: Aggregation of predictors for nonstationary sub-linear processes and online adaptive forecasting of time varying autoregressive processes. We explain how to build nonadaptive minimax predictors which can be used in the aggregation step. The document also contains some technical proofs and provides additional results with improved aggregation rates.