Institute of Mathematical Statistics Collections

Model selection and sensitivity analysis for sequence pattern models

Mayetri Gupta

Full-text: Open access


In this article we propose a maximal a posteriori (MAP) criterion for model selection in the motif discovery problem and investigate conditions under which the MAP asymptotically gives a correct prediction of model size. We also investigate robustness of the MAP to prior specification and provide guidelines for choosing prior hyper-parameters for motif models based on sensitivity considerations.

Chapter information

N. Balakrishnan, Edsel A. Peña and Mervyn J. Silvapulle, eds., Beyond Parametrics in Interdisciplinary Research: Festschrift in Honor of Professor Pranab K. Sen (Beachwood, Ohio, USA: Institute of Mathematical Statistics, 2008), 390-407

First available in Project Euclid: 1 April 2008

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Primary: 62F15: Bayesian inference 62P10: Applications to biology and medical sciences
Secondary: 62F12: Asymptotic properties of estimators

Bayes factor MAP model selection motif discovery

Copyright © 2008, Institute of Mathematical Statistics


Gupta, Mayetri. Model selection and sensitivity analysis for sequence pattern models. Beyond Parametrics in Interdisciplinary Research: Festschrift in Honor of Professor Pranab K. Sen, 390--407, Institute of Mathematical Statistics, Beachwood, Ohio, USA, 2008. doi:10.1214/193940307000000301.

Export citation


  • [1] Abramowitz, M. and Stegun, I. A. (1972). Handbook of Mathematical Functions. Dover, New York.
  • [2] Berger, J. O. (1993). Statistical Decision Theory and Bayesian Analysis. Springer, Berlin.
  • [3] Chen, M.-H. and Shao, Q.-M. (1997a). Estimating ratios of normalizing constants for densities with different dimensions. Statist. Sinica 7 607–630.
  • [4] Chen, M.-H. and Shao, Q.-M. (1997b). On Monte Carlo methods for estimating ratios of normalizing constants. Ann. Statist. 25 1563–1594.
  • [5] Chib, S. and Jeliazkov, I. (2001). Marginal likelihood from the Metropolis-Hastings output. J. Amer. Statist. Assoc. 96 270–281.
  • [6] Gupta, M. and Liu, J. S. (2003). Discovery of conserved sequence patterns using a stochastic dictionary model. J. Amer. Statist. Assoc. 98 55–66.
  • [7] Gupta, M. and Liu, J. S. (2006). Bayesian modeling and inference for motif discovery. Bayesian Inference for Gene Expression and Proteomics. Cambridge Univ. Press.
  • [8] Kass, R. E. (1993). Bayes factors in practice. Statistician 42 551–560.
  • [9] Lawrence, C. E., Altschul, S. F., Boguski, M. S., Liu, J. S., Neuwald, A. F. and Wootton, J. C. (1993). Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment. Science 262 208–14.
  • [10] Lawrence, C. E. and Reilly, A. A. (1990). An expectation-maximization (EM) algorithm for the identification and characterization of common sites in biopolymer sequences. Proteins 7 41–51.
  • [11] Leamer, E. E. (1982). Sets of posterior means with bounded variance prior. Econometrica 50 725–736.
  • [12] Leroux, B. G. (1992). Consistent estimation of a mixing distribution. Ann. Statist. 20 1350–1360.
  • [13] Meng, X. L. and Wong, W. (1996). Simulating ratios of normalising constants via a simple identity: A theoretical exploration. Statist. Sinica 6 831–860.
  • [14] Polasek, W. (1982). Local sensitivity analysis and matrix derivatives. In Operations Research in Progress (G. Feichtinger et al., eds.) 425–443. Reidel, Dordrecht.
  • [15] Sandve, G. K. and Drablos, F. (2006). A survey of motif discovery methods in an integrated framework. Biology Direct 1 11.
  • [16] Stormo, G. D. and Hartzell, G. W. (1989). Identifying protein-binding sites from unaligned DNA fragments. Proc. Natl. Acad. Sci. USA 86 1183–1187.
  • [17] Wasserman, W. W., Palumbo, M., Thompson, W., Fickett, J. W. and Lawrence, C. E. (2000). Human-mouse genome comparisons to locate regulatory sites. Nature Genetics 26 225–228.