Electronic Journal of Statistics

Efficient methods for the estimation of the multinomial parameter for the two-trait group testing model

Gregory Haber and Yaakov Malinovsky

Full-text: Open access

Abstract

Estimation of a single Bernoulli parameter using pooled sampling is among the oldest problems in the group testing literature. To carry out such estimation, an array of efficient estimators have been introduced covering a wide range of situations routinely encountered in applications. More recently, there has been growing interest in using group testing to simultaneously estimate the joint probabilities of two correlated traits using a multinomial model. Unfortunately, basic estimation results, such as the maximum likelihood estimator (MLE), have not been adequately addressed in the literature for such cases. In this paper, we show that finding the MLE for this problem is equivalent to maximizing a multinomial likelihood with a restricted parameter space. A solution using the EM algorithm is presented which is guaranteed to converge to the global maximizer, even on the boundary of the parameter space. Two additional closed form estimators are presented with the goal of minimizing the bias and/or mean square error. The methods are illustrated by considering an application to the joint estimation of transmission prevalence for two strains of the Potato virus Y by the aphid Myzus persicae.

Article information

Source
Electron. J. Statist., Volume 13, Number 2 (2019), 2624-2657.

Dates
Received: July 2018
First available in Project Euclid: 14 August 2019

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1565748203

Digital Object Identifier
doi:10.1214/19-EJS1583

Mathematical Reviews number (MathSciNet)
MR3992500

Zentralblatt MATH identifier
07104726

Keywords
EM algorithm group testing multinomial sampling restricted parameter space

Rights
Creative Commons Attribution 4.0 International License.

Citation

Haber, Gregory; Malinovsky, Yaakov. Efficient methods for the estimation of the multinomial parameter for the two-trait group testing model. Electron. J. Statist. 13 (2019), no. 2, 2624--2657. doi:10.1214/19-EJS1583. https://projecteuclid.org/euclid.ejs/1565748203


Export citation

References

  • [1] Avrahami-Moyal, L., Tam, Y., Brumin, M., Prakash, S., Leibman, D., Pearlsman, M., Bornstein, M., Sela, N., Zeidan, M., Dar, Z., Zig, U., Gal-On, A., and Gaba, V. (2017). Detection of Potato virus Y in industrial quantities of seed potatoes by TaqMan Real Time PCR., Phytoparasitica 45 591–598.
  • [2] Burrows, P. M. (1987). Improved Estimation of Pathogen Transmission Rates by Group Testing., Phytopathology 77 363–365.
  • [3] Ding, J. and Xiong, W. (2015). Robust group testing for multiple traits with misclassification., Journal of Applied Statistics 42 2115–2125.
  • [4] Ding, J. and Xiong, W. (2016). A new estimator for a population proportion using group testing., Communications in Statistics–Simulation and Computation 45 101–114.
  • [5] Fletcher, J. D. (2012). A virus survey of New Zealand fresh, process and seed potato crops during 2010-11., New Zealand Plant Protection 65 197–203.
  • [6] Gray, S., De Boer, S., Lorenzen, J., Karazev, A., Whitworth, J., Nolte, P., Singh, R., Boucher, A., and Xu, H. (2010). Potato virus Y: an evolving concern for potato crops in the United States and Canada., Plant Disease 94 1384–1397.
  • [7] Grendár, M. and Špitalský, V. (2017). Multinomial and empirical likelihood under convex constraints: Directions of recession, Fenchel duality, the PP algorithm., Electronic Journal of Statistics 11 2547–2612.
  • [8] Haber, G. and Malinovsky, Y. (2017). Random walk designs for selecting pool sizes in group testing estimation with small samples., Biometrical Journal 59 1382–1398.
  • [9] Haber, G. and Malinovsky, Y. (2018). On the construction of unbiased estimators for the group testing problem., Sankhya A. https://doi.org/10.1007/s13171-018-0156-4.
  • [10] Haber, G., Malinovsky, Y., and Albert, P. S. (2018). Sequential estimation in the group testing problem., Sequential Analysis 37 1–17.
  • [11] Hepworth, G. and Watson, R. (2009). Debiased estimation of proportions in group testing., Journal of Royal Statistical Society, Series C 58 105–121.
  • [12] Hughes-Oliver, J. M. and Rosenberger, W. (2000). Efficient estimation of the prevalence of multiple rare traits., Biometrika 87 315–327.
  • [13] Hughes-Oliver, J. M. and Swallow, W. H. (1994). A two-stage adaptive group testing procedure for estimating small proportions., Journal of the American Statistical Association 89 982–993.
  • [14] Hyun, N., Gastwirth, J. L., Graubard, B. I. (2018). Grouping methods for estimating prevalences of rare traits for complex survey data that preserve confidentiality of respondents., Statistics in Medicine 37 2174–2186.
  • [15] Jamshidian, M. (2004). On algorithms for restricted maximum likelihood estimation., Computational Statistics and Data Analysis 45 137–157.
  • [16] Li, Q., Liu, A., and Xiong, W. (2017). D-Optimality of group testing for joint estimation of correlated rare diseases with misclassification., Statistica Sinica 27 823–838.
  • [17] Liu, S. C., Chiang, K. S., Lin, C. H., Chung, W. C., Lin, S. H., and Yang, T. C. (2011). Cost analysis in choosing group size when group testing for Potato virus Y in the presence of classification errors., Annals of Applied Biology 159 491–502.
  • [18] Liu, A., Liu, C., Zhang, Z., and Albert, P. S. (2012). Optimality of group testing in the presence of misclassification., Biometrika 99 245–251.
  • [19] Lorenzen, J. H., Piche, L. M., Gudmestad, N. C., Meacham, T., and Shiel, P. (2006). A multiplex PCR assay to characterize potato virus Y isolates and identify strain mixtures., Plant Disease 90 935–940.
  • [20] Mallik, I., Anderson, N. R., and Gudmestad, N. C. (2012). Detection and differentiation of Potato Virus Y strains from potato using immunocapture multiples RT-PCR., American Journal of Potato Research 89 184–191.
  • [21] Mello, A. F. S., Olarte, R. A., Gray, S. M., and Perry, K. L. (2011). Transmission efficiency of Potato virus Y strains, PVYO and PVYN-Wi by five aphid species. Plant Disease 95 1279–1283.
  • [22] Mondal, S., Lin, Y., Carroll, J. E., Wenninger, E. J., Bosque-Perez, N. A., Whitworth, J. L., Hutchinson, P., Eigenbrode, S., and Gray, S. M. (2017). Potato virus Y transmission efficiency from potato infected with single or multiple virus strains., Phytopathology 107 491–498.
  • [23] Nelder, J. A. and Mead, R. (1965). A simplex method for function minimization., The Computer Journal 7 308–313.
  • [24] Nettleton, D. (1999). Convergence properties of the EM Algorithm in constrained parameter spaces., Canadian Journal of Statistics 27 639–648.
  • [25] Pfeiffer, R. M., Rutter, J. L., Gail, M. H., Struewing, J., and Gastwirth, J. L. (2002). Efficiency of DNA pooling to estimate joint allele frequencies and measure linkage disequilibrium., Genetic Epidemiology 22 94–102.
  • [26] Santos, J. D. and Dorgman, D. (2016). An approximate likelihood estimator for the prevalence of infections in vectors using pools of varying sizes., Biometrical Journal 58 1248–1256.
  • [27] Swallow, W. H. (1985). Group Testing for Estimating Infection Rates and Probabilities of Disease Transmission., Phytopathology 75 882–889.
  • [28] Tebbs, J. M., Bilder, C. R., and Koser, B. K. (2003). An empirical Bayes group-testing approach to estimating small proportions., Communications in Statistics – Theory and Methods 32 983–995.
  • [29] Tebbs, J. M., McMahan, C. S., and Bilder, C. R. (2013). Two-stage hierarchical group testing for multiple infections with application to the infertility prevention project., Biometrics 69 1064–1073.
  • [30] Thompson, K. H. (1962). Estimation of the proportion of vectors in a natural population of insects, Biometrics 18 568–578.
  • [31] Tu, X. M., Litvak, E., and Pagano, M. (1995). On the informativeness and accuracy of pooled testing in estimating prevalence of a rare disease: application to HIV screening., Biometrika 82 287–297.
  • [32] Warasi, M. S., Tebbs, J. M., McMahan, C. S., and Bilder, C. R. (2016). Estimating the prevalence of multiple diseases from two-stage hierarchical pooling., Statistics In Medicine 35 3851–3864.
  • [33] Wu, C. F. (1983). On the convergence properties of the EM algorithm., The Annals of Statistics 11 95–103.
  • [34] Zhang, Z., Liu, C., Kim, S., and Liu, A. (2014). Prevalence estimation subject to misclassification: the mis-substitution bias and some remedies., Statistics in Medicine 33 4482–4500.