The Annals of Statistics

Approximate group context tree

Alexandre Belloni and Roberto I. Oliveira

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


We study a variable length Markov chain model associated with a group of stationary processes that share the same context tree but each process has potentially different conditional probabilities. We propose a new model selection and estimation method which is computationally efficient. We develop oracle and adaptivity inequalities, as well as model selection properties, that hold under continuity of the transition probabilities and polynomial $\beta$-mixing. In particular, model misspecification is allowed.

These results are applied to interesting families of processes. For Markov processes, we obtain uniform rate of convergence for the estimation error of transition probabilities as well as perfect model selection results. For chains of infinite order with complete connections, we obtain explicit uniform rates of convergence on the estimation of conditional probabilities, which have an explicit dependence on the processes’ continuity rates. Similar guarantees are also derived for renewal processes.

Our results are shown to be applicable to discrete stochastic dynamic programming problems and to dynamic discrete choice models. We also apply our estimator to a linguistic study, based on recent work by Galves et al. [Ann. Appl. Stat. 6 (2012) 186–209], of the rhythmic differences between Brazilian and European Portuguese.

Article information

Ann. Statist., Volume 45, Number 1 (2017), 355-385.

Received: February 2015
Revised: December 2015
First available in Project Euclid: 21 February 2017

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62M05: Markov processes: estimation 62M09: Non-Markovian processes: estimation 62G05: Estimation
Secondary: 62P20: Applications to economics [See also 91Bxx] 60J10: Markov chains (discrete-time Markov processes on discrete state spaces)

Categorical time series group context tree dynamic discrete choice models dynamic programming model selection VLMC


Belloni, Alexandre; Oliveira, Roberto I. Approximate group context tree. Ann. Statist. 45 (2017), no. 1, 355--385. doi:10.1214/16-AOS1455.

Export citation


  • [1] Aguirregabiria, V. and Mira, P. (2010). Dynamic discrete choice structural models: A survey. J. Econometrics 156 38–67.
  • [2] Arellano, M. and Honoré, B. H. (2001). Panel data models: Some recent developments. Handb. Econom. 5 3229–3296.
  • [3] Bejerano, G. (2004). Algorithms for variable length Markov chain modeling. Bioinformatics 20 788–789.
  • [4] Belloni, A. and Oliveira, R. I. (2016). Supplement to “Approximate group context tree.” DOI:10.1214/16-AOS1455SUPP.
  • [5] Bertsekas, D. P. (1987). Dynamic Programming: Deterministic and Stochastic Models. Prentice Hall, Englewood Cliffs, NJ.
  • [6] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
  • [7] Browning, M. and Carro, J. M. (2010). Heterogeneity in dynamic discrete choice models. Econom. J. 13 1–39.
  • [8] Browning, M. and Carro, J. M. (2014). Dynamic binary outcome models with maximal heterogeneity. J. Econometrics 178 805–823.
  • [9] Bühlmann, P. (1999). Efficient and adaptive post-model-selection estimators. J. Statist. Plann. Inference 79 1–9.
  • [10] Bühlmann, P. (2000). Model selection for variable length Markov chains and tuning the context algorithm. Ann. Inst. Statist. Math. 52 287–315.
  • [11] Bühlmann, P. and Wyner, A. J. (1999). Variable length Markov chains. Ann. Statist. 27 480–513.
  • [12] Chernozhukov, V., Fernandez-Val, I., Hahn, J. and Newey, W. (2009). Identification and estimation of marginal effects in nonlinear panel models. Available at arXiv:0904.1990.
  • [13] Csiszár, I. and Shields, P. C. (1996). Redundancy rates for renewal and other processes. IEEE Trans. Inform. Theory 42 2065–2072.
  • [14] Csiszár, I. and Talata, Z. (2006). Context tree estimation for not necessarily finite memory processes, via BIC and MDL. IEEE Trans. Inform. Theory 52 1007–1016.
  • [15] Farias, V. F., Moallemi, C. C., Van Roy, B. and Weissman, T. (2010). Universal reinforcement learning. IEEE Trans. Inform. Theory 56 2441–2454.
  • [16] Ferrari, F. and Wyner, A. (2003). Estimation of general stationary processes by variable length Markov chains. Scand. J. Statist. 30 459–480.
  • [17] Galves, A., Galves, C., García, J. E., Garcia, N. L. and Leonardi, F. (2012). Context tree selection and linguistic rhythm retrieval from written texts. Ann. Appl. Stat. 6 186–209.
  • [18] Garivier, A. (2006). Redundancy of the context-tree weighting method on renewal and Markov renewal processes. IEEE Trans. Inform. Theory 52 5579–5586.
  • [19] Garivier, A. and Leonardi, F. (2011). Context tree selection: A unifying view. Stochastic Process. Appl. 121 2488–2506.
  • [20] Lepskiĭ, O. V. (1990). A problem of adaptive estimation in Gaussian white noise. Teor. Veroyatnost. i Primenen. 35 459–470.
  • [21] Lounici, K., Pontil, M., Tsybakov, A. B. and van de Geer, S. (2010). Taking advantage of sparsity in multi-task learning. In Proc. Computational Learning Theory Conference (COLT 2009).
  • [22] Obozinski, G., Wainwright, M. J. and Jordan, M. I. (2011). Support union recovery in high-dimensional multivariate regression. Ann. Statist. 39 1–47.
  • [23] Puterman, M. L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York.
  • [24] Rissanen, J. (1983). A universal data compression system. IEEE Trans. Inform. Theory 29 656–664.
  • [25] Ross, S. (1983). Introduction to Stochastic Dynamic Programming. Academic Press, New York.
  • [26] Talata, Z. and Duncan, T. (2009). Unrestricted bic context tree estimation for not necessarily finite memory processes. In 2009 IEEE International Symposium on Information Theory 724–728.
  • [27] Vert, J.-P. (2001). Adaptive context trees and text clustering. IEEE Trans. Inform. Theory 47 1884–1901.
  • [28] Willems, F. M. J., Shtarkov, Y. M. and Tjalkens, T. J. (1995). The context-tree weighting method: Basic properties. IEEE Trans. Inform. Theory 41 653–664.
  • [29] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49–67.

Supplemental materials

  • Supplement to “Approximate group context tree”. We provide additional discussion on the oracle context tree, omitted proofs from Section 5, a compendium of Martingale results, minimax rates for chain with infinite connections, and simulation results.