The Annals of Applied Statistics

Accounting for time dependence in large-scale multiple testing of event-related potential data

Ching-Fan Sheu, Émeline Perthame, Yuh-shiow Lee, and David Causeur

Full-text: Open access


Event-related potentials (ERPs) are recordings of electrical activity along the scalp time-locked to perceptual, motor and cognitive events. Because ERP signals are often rare and weak, relative to the large between-subject variability, establishing significant associations between ERPs and behavioral (or experimental) variables of interest poses major challenges for statistical analysis.

Noting that ERP time dependence exhibits a block pattern suggesting strong local and long-range autocorrelation components, we propose a flexible factor modeling of dependence. An adaptive factor adjustment procedure is derived from a joint estimation of the signal and noise processes, given a prior knowledge of the noise-alone intervals. A simulation study is presented using known signals embedded in a real dependence structure extracted from authentic ERP measurements. The proposed procedure performs well compared with existing multiple testing procedures and is more powerful at discovering interesting ERP features.

Article information

Ann. Appl. Stat., Volume 10, Number 1 (2016), 219-245.

Received: July 2014
Revised: October 2015
First available in Project Euclid: 25 March 2016

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Dependence ERP data high-dimensional data multiple testing


Sheu, Ching-Fan; Perthame, Émeline; Lee, Yuh-shiow; Causeur, David. Accounting for time dependence in large-scale multiple testing of event-related potential data. Ann. Appl. Stat. 10 (2016), no. 1, 219--245. doi:10.1214/15-AOAS888.

Export citation


  • Allen, G. I., Grosenick, L. and Taylor, J. (2014). A generalized least-square matrix decomposition. J. Amer. Statist. Assoc. 109 145–159.
  • Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B. Stat. Methodol. 57 289–300.
  • Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29 1165–1188.
  • Blair, R. C. and Karniski, W. (1993). An alternative method for significance testing of waveform difference potentials. Psychophysiology 30 518–524.
  • Buja, A. and Eyuboglu, N. (1992). Remarks on parallel analysis. Multivariate Behavioral Research 27 509–540.
  • Causeur, D. and Sheu, C. F. (2014). ERP: Significance analysis of Event-Related Potentials data. R package version 1.0.1.
  • Causeur, D., Friguet, C., Houée-Bigot, M. and Kloareg, M. (2011). Factor analysis for multiple testing (FAMT): An R package for large-scale significance testing under dependence. Journal of Statistical Software 40 1–19.
  • Causeur, D., Chu, M. C., Hsieh, S. and Sheu, C. F. (2012). A factor-adjusted multiple testing procedure for ERP data analysis. Behavior Research Methods 44 635–643.
  • Donoho, D. and Jin, J. (2008). Higher criticism thresholding: Optimal feature selection when useful features are rare and weak. Proc. Natl. Acad. Sci. USA 105 14790–14795.
  • Efron, B. (2007). Correlation and large-scale simultaneous significance testing. J. Amer. Statist. Assoc. 102 93–103.
  • Efron, B. (2010). Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction. Institute of Mathematical Statistics (IMS) Monographs 1. Cambridge Univ. Press, Cambridge.
  • Friguet, C., Kloareg, M. and Causeur, D. (2009). A factor model approach to multiple testing under dependence. J. Amer. Statist. Assoc. 104 1406–1415.
  • Gardiner, J. M., Gawlik, B. and Richardson-Klavehn, A. (1994). Maintenance rehearsal affects knowing, not remembering: Elaborative rehearsal affects remembering, not knowing. Psychonomic Bulletin & Review 1 107–110.
  • Groppe, D. M., Urbach, T. P. and Kutas, M. (2011a). Mass univariate analysis of event-related brain potentials/fields I: A critical tutorial review. Psychophysiology 48 1711–1725.
  • Groppe, D. M., Urbach, T. P. and Kutas, M. (2011b). Mass univariate analysis of event-related brain potentials/fields II: Simulation studies. Psychophysiology 48 1726–1737.
  • Guthrie, D. and Buchwald, J. S. (1991). Significance testing of difference potentials. Psychophysiology 28 240–244.
  • Handy, T. (2004). Event-Related Potentials. MIT Press, Cambridge, MA.
  • Jin, J. (2009). Impossibility of successful classification when useful features are rare and weak. Proc. Natl. Acad. Sci. USA 106 8859–8864.
  • Johnson, H. M. (1994). Processes of successful intentional forgetting. Psychological Bulletin 116 274–292.
  • Jöreskog, K. G. (1967). Some contributions to maximum likelihood factor analysis. Psychometrika 32 443–482.
  • Lage-Castellanos, A., Martínez-Montes, E., Hernández-Cabrera, J. A. and Galán, L. (2010). False discovery rate and permutation test: An evaluation in ERP data analysis. Stat. Med. 29 63–74.
  • Lee, Y. S., Lee, H. M. and Fawcett, J. M. (2013). Intentional forgetting reduces color-naming interference: Evidence from item-method directed forgetting. Journal of Experimental Psychology. Learning, Memory and Cognition 39 220–236.
  • Leek, J. T. and Storey, J. D. (2008). A general framework for multiple testing dependence. Proc. Natl. Acad. Sci. USA 105 18718–18723.
  • Leek, J. T., Johnson, W. E., Parker, H. S., Jaffe, A. E. and Storey, J. D. (2014). SVA: Surrogate Variable Analysis. R package version 3.6.0.
  • Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979). Multivariate Analysis. Academic Press, London.
  • Näätänen, R. (2003). Mismatch negativity: Clinical research and possible applications. Int. J. Psychophysiol. 48 179–188.
  • Paz-Caballero, M. D. and Menor, J. (1999). ERP correlates of directed forgetting effects in direct and indirect memory tests. European Journal of Cognitive Psychology 11 239–260.
  • Poldrack, R. A., Mumford, J. A. and Nichols, T. E. (2011). Handbook of Functional MRI Data Analysis. Cambridge Univ. Press, Cambridge.
  • Press, W. H., Teukolsky, S. A., Vetterling, W. T. and Flannery, B. P. (2007). Numerical Recipes: The Art of Scientific Computing, 3rd ed. Cambridge Univ. Press, Cambridge.
  • Rosberg, T., Butrous, N. N. and Ford, J. M. (2008). Reduced auditory evoked potential component N100 in schizophrenia—A critical review. Psychiatric Research 161 259–274.
  • Rubin, D. B. and Thayer, D. T. (1982). EM algorithms for ML factor analysis. Psychometrika 47 69–76.
  • Rugg, M. D. and Curran, T. (2007). Event-related potentials and recognition memory. Trends Cogn. Sci. (Regul. Ed.) 11 251–257.
  • Sheu, C. F., Perthame, E., Lee, Y. S. and Causeur, D. (2016). Supplement to “Accounting for time dependence in large-scale multiple testing of event-related potential data.” DOI:10.1214/15-AOAS888SUPP.
  • Sun, W. and Cai, T. T. (2009). Large-scale multiple testing under dependence. J. R. Stat. Soc. Ser. B. Stat. Methodol. 71 393–424.
  • Sun, Y., Zhang, N. R. and Owen, A. B. (2012). Multiple hypothesis testing adjusted for latent variables, with an application to the AGEMAP gene expression data. Ann. Appl. Stat. 6 1664–1688.
  • Sun, Y., Zhang, N. R. and Owen, A. B. (2014). LEAPP: Latent effect adjustment after primary projection. R package version 1.1.
  • Thomson, G. H. (1951). The Factorial Analysis of Human Ability. London Univ. Press, London.
  • van der Laan, M. J. and Dudoit, S. (2007). Multiple Testing Procedures with Applications to Genomics. Springer, New York.
  • Vul, E., Harris, C., Winkielman, P. and Pashler, H. (2009). Puzzlingly high correlations in fMRI studies of emotion, personality, and social cognition. Perspect. Psychol. Sci. 4 274–290.
  • Weiner, B. (1968). Motivated forgetting and the study of repression. J. Pers. 36 213–234.
  • Westfall, P. H. and Young, S. S. (1993). Resampling-Based Multiple Testing: Examples and Methods for P-Value Adjustment. Wiley, New York.
  • Williams, L. M., Simms, E., Clark, C. R., Paul, R. H., Rowe, D. and Gordon, E. (2005). The test-retest reliability of a standardized neurocognitive and neurophysiological test battery: ”neuromarker”. Int. J. Neurosci. 115 1605–1630.
  • Woolrich, M. W., Beckmann, C. F., Nichols, T. E. and Smith, S. M. (2009). Statistical Analysis of FMRI Data. In fMRI techniques and protocols (M. Filippi, ed.). Humana Press, New York.

Supplemental materials

  • Accounting for time dependence in large-scale multiple testing of event-related potential data: Online supplement. The impact of ERP time dependence on multiple testing results. To demonstrate the impact of time dependence on the ability of multiple testing procedures to identify a predetermined true signal, a simulation study is conducted in which ERP data are generated according to model (3.1). This simulation study compares the GB procedure [Guthrie and Buchwald (1991)] and two FDR-controlling procedures: BH [Benjamini and Hochberg (1995)] and BY [Benjamini and Yekutieli (2001)]. The results highlight the instability of multiple testing results when using methods ignoring dependence among tests.