Institute of Mathematical Statistics Collections

An asymptotically normal test for the selective neutrality hypothesis

Aluísio Pinheiro, Hildete P. Pinheiro, and Samara Kiihl

Full-text: Open access


An important parameter in the study of population evolution is θ=4Nν, where N is the effective population size and ν is the rate of mutation per locus per generation. Therefore, θ represents the mean number of mutations per site per generation. There are many estimators of θ, one of them being the mean number of pairwise nucleotide differences, which we call T2. Other estimators are T1, based on the number of segregating sites and T3, based on the number of singletons. The concept of selective neutrality can be interpreted as a differentiated nucleotide distribution for mutant sites when compared to the overall nucleotide distribution. Tajima (1989) has proposed the so-called Tajima’s test of selective neutrality based on T2T1. Its complex empirical behavior (Kiihl, 2005) motivates us to propose a test statistic solely based on T2. We are thus able to prove asymptotic normality under different assumptions on the number of sequences and number of sites via U-statistics theory.

Chapter information

N. Balakrishnan, Edsel A. Peña and Mervyn J. Silvapulle, eds., Beyond Parametrics in Interdisciplinary Research: Festschrift in Honor of Professor Pranab K. Sen (Beachwood, Ohio, USA: Institute of Mathematical Statistics, 2008), 377-389

First available in Project Euclid: 1 April 2008

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Primary: 62G10: Hypothesis testing 62G20: Asymptotic properties
Secondary: 62P10: Applications to biology and medical sciences

asymptotic normality U-statistics population evolution

Copyright © 2008, Institute of Mathematical Statistics


Pinheiro, Aluísio; Pinheiro, Hildete P.; Kiihl, Samara. An asymptotically normal test for the selective neutrality hypothesis. Beyond Parametrics in Interdisciplinary Research: Festschrift in Honor of Professor Pranab K. Sen, 377--389, Institute of Mathematical Statistics, Beachwood, Ohio, USA, 2008. doi:10.1214/193940307000000293.

Export citation


  • [1] Arvesen, J. N. (1969). Jackknifing U-statistics. Ann. Math. Statist. 40 2076–2100.
  • [2] Chakraborty, R. and Rao, C. R. (1991). Measurement of genetic variation for evolutionary studies. In Handbook of Statistics (C. R. Rao and R. Chakraborty, eds.) 8. North-Holand, Amsterdam.
  • [3] Feller, W. (1971). An Introduction to Probability Theory and Its Applications. II, 2nd. ed. Wiley, New York.
  • [4] Fitch, W. M. and Margoliash, E. (1967). A method for estimating the number of invariant amino acid coding positions in a gene using cytochrome c as a model case. Biochem. Genet. 1 65–71.
  • [5] Fu, Y.-X. (1994). A phylogenetic estimator of effective population size or mutation rate. Genetics 136 685–692.
  • [6] Fu, Y.-X. (1995). Statistical properties of segregating sites. Theoretical Population Biology 48 172–197.
  • [7] Fu, Y.-X. and Li, W. H. (1993). Statistical tests of neutrality of mutations. Genetics 133 693–709.
  • [8] Gini, C. W. (1912). Variabilita e mutabilita. Studi Economico-Giuridici della R. Universita di Cagliari 3 3–159.
  • [9] Hartl, D. and Clark, A. (1997). Principles of Population Genetics, 3rd ed. Sinauer Associates, Sunderland Massachusetts.
  • [10] Hoeffding, W. (1948). A class of statistics with asymptotically normal distribution. Ann. Math. Statistics 19 293–325.
  • [11] Holmes, E. C. and Brown, A. J. (1992). Convergent and divergent sequence evolution in the surface envelope of glycoprotein of human immunodeficiency virus type 1 within a single infected patient. PNAS 89 4835–4839.
  • [12] Jukes, T. H. and Cantor, C. R. (1969). Evolution of protein molecules. In Mamalian Protein Metabolism III (H. N. Munro, ed.) 21–132. Academic Press, New York.
  • [13] Kiihl, S. F. (2005). Análise estatística de polimofismo molecular em seqüências de DNA utilizando informações filogenéticas. Master’s thesis, Universidade Estadual de Campinas, Instituto de Matemática, Estatística e Computação Científica.
  • [14] Korolyuk, V. S. and Borovskikh, Yu. V. (1985). Approximation of nondegenerate U-statistics. J. Theory Probab. Appl. 30 417–426.
  • [15] Pinheiro, A., Pinheiro, H. P. and Sen, P. K. (2005). The use of Hamming distance in bioinfomatics. In Handbook of Statist. Bioinformatics. To appear.
  • [16] Pinheiro, A., Sen, P. K. and Pinheiro, H. P. (2006). Decomposability of high-dimensional diversity measures: quasi u-statistics, martingales and nonstandard asymptotics. Submitted for publication.
  • [17] Pinheiro, H. P., Seillier-Moiseiwitsch, F., Sen, P. K. and Eron, Jr., J. (2000). Genomic sequences and quasi-multivariate CATANOVA. In Bioenvironmental and Public Health Statistics. Handbook of Statist. 18 713–746. North-Holland, Amsterdam.
  • [18] Pinheiro, H. P., Seillier-Moiseiwitsch, F. and Sen, P. K. (2001). Analysis of variance for hamming distance appllied to unbalanced designs. Technical Report 30/01, Universidade Estadual de Campinas, Instituto de Matemática, Estatística e Computação Científica, Brazil.
  • [19] Pinheiro, H. P., Pinheiro, A. and Sen, P. K. (2005). Comparison of genomic sequences using the Hamming distance. J. Statist. Plann. Inference 130 325–339.
  • [20] Rao, C. R. (1982a). Diversity and dissimilarity coefficients: A unified approach. Theoretical Population Biology 21 24–43.
  • [21] Rao, C. R. (1982b). Diversity: Its measurement, decomposition, apportionment and analysis. Sankhyā Ser. A 44 1–22.
  • [22] Sen, P. K. (1960). On some convergence properties of U-statistics. Calcutta Statist. Assoc. Bull. 10 1–18.
  • [23] Sen, P. K. (1999). Utility-oriented Simpson-type indexes and inequality measures. Calcutta Statist. Assoc. Bull. 49 1–22.
  • [24] Sen, P. K. (2001). Excursions in biostochastic: Biometry to biostatistics to bioinformatics. Lecture Notes, Academia Sinica Inst. Statist. Sci. Taipei, ROC.
  • [25] Sen, P. K. (2006). Robust statistical inference for high-dimensional data models with applications to genomics. Austrian J. Statist. Probab. 36 197–211.
  • [26] Sen, P. K., Tsai, M.-T. and Jou, Y.-S. (2005). High dimension low sample size perspectives in constrained statistical inference: The SARSCoV RNA genome in illustration. Submitted for publication.
  • [27] Simpson, E. H. (1949). The Measurement of Diversity. Nature 163 688.
  • [28] Souza, F. L., Cunha, A. F., Oliveira, M. A., Pereira, G. A. G. and Reis, S. F. (2003). Preliminary phylogeographic analysis of the neotropical freshwater turtle Hydromedusa maximiliani (Chelidae). J. Herpetology 37 199–205.
  • [29] Tajima, F. (1989). Statistical method for testing the neutral mutation hypothesis by DNA polymophism. Genetics 123 585–595.
  • [30] Tihomirov, A. N. (1980). Convergence rate in the central limit theorem for weakly dependent random variables. J. Theory Probab. Appl. 25 800–818.
  • [31] Utev, S. A. (1990). On the central limit theorem for ϕ-mixing arrays of random variables. J. Theory Probab. Appl. 35 131–139.
  • [32] Uzzell, T. and Corbin, K. W. (1971). Fitting discrete probability distributions to evolutionary events. Science 172 1089–1096.
  • [33] Withers, C. S. (1981). Central limit theorems for dependent variables. I. Z. Wahrsch. Verw. Gebiete 57 509–534.
  • [34] Yang, Z. (1996). Among-site variation and its impact on phylogenetic analyses. TREE 11 367–372.
  • [35] Yoshihara, K.-I. (1984). The Berry-Esseen theorems for U-statistics generated by absolutely regular processes. Yokohama Math. J. 32 89–111.