Statistical Science

Eliminating multiple root problems in estimation (with comments by John J. Hanfelt, C. C. Heyde and Bing Li, and a rejoinder by the authors)

Christopher G. Small, Jinfang Wang, and Zejiang Yang

Full-text: Open access


Estimating functions, such as the score or quasiscore,can have more than one root. In many of these cases, theory tells us that there is a unique consistent root of the estimating function. However, in practice, there may be considerable doubt as to which root is appropriate as a parameter estimate. The problem is of practical importance to data analysts and theoretically challenging as well. In this paper, we review the literature on this problem. A variety of examples are provided to illustrate the diversity of situations in which multiple roots can arise. Some methods are suggested to investigate the possibility of multiple roots, search for all roots and compute the distributions of the roots. Various approaches are discussed for selecting among the roots. These methods include (1) iterating from consistent estimators, (2) examining the asymptotics when explicit formulas for roots are available, (3) testing the consistency of each root, (4) selecting by bootstrapping and (5) using information-theoretic methods for certain parametric models. As an alternative approach to the problem, we consider how an estimating function can be modified to reduce the number of roots. Finally, we survey some techniques of artificial likelihoods for semiparametric models and discuss their relationship to the multiple root problem.

Article information

Statist. Sci., Volume 15, Number 4 (2000), 313-341.

First available in Project Euclid: 24 December 2001

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Bootstrapping consistent root estimating functions likelihood multiple roots Newton–Raphson iteration parameter quasilikelihood


Small, Christopher G.; Wang, Jinfang; Yang, Zejiang. Eliminating multiple root problems in estimation (with comments by John J. Hanfelt, C. C. Heyde and Bing Li, and a rejoinder by the authors). Statist. Sci. 15 (2000), no. 4, 313--341. doi:10.1214/ss/1009213001.

Export citation


  • Amemiya, T. (1973). Regression analysis when the dependent variable is truncated normal. Econometrica 41 997-1016.
  • Bahadur, R. R. (1958). Examples of inconsistency of maximum likelihood estimates. Sankhy¯a 20 207-210.
  • Barndorff-Nielsen, O. E. (1983). On a formula for the distribution of the maximum likelihood estimator. Biometrika 70 343-365.
  • Barnett, V. D. (1966). Evaluation of the maximum likelihood estimator where the likelihood equation has multiple roots. Biometrika 53 151-165.
  • Bartlett, M. S. (1936). The information available in small samples. Proc. Cambridge Philos. Soc. 32 560-566.
  • Basford, K. E. and McLachlan, G. J. (1985). Likelihood estimation with normal mixture models. J. Roy. Statist. Soc. Ser. C 34 282-289.
  • Burridge, J. (1981). A note on maximum likelihood estimation for regression models using grouped data. J. Roy. Statist. Soc. Ser. B 43 41-45.
  • Chaubey, Y. P. and Gabor, G. (1981). Another look at Fisher's solution to the problem of the weighted mean. Comm. Statist. A 10 1225-1237.
  • Clarke, R. B. (1991). The selection functional. Probab. Math. Statist. 11 149-156.
  • Copas, J. B. (1975). On the unimodality of the likelihood for the Cauchy distribution. Biometrika 62 701-704.
  • Cram´er, H. (1946). Mathematical Methods of Statistics. Princeton Univ. Press.
  • Crowder, M. (1986). On consistency and inconsistency of estimating equations. Econom. Theory 2 305-330.
  • Daniels, H. E. (1960). The asymptotic efficiency of a maximum likelihood estimator. Proc. Fourth Berkeley Symp. Math. Statist. Probab. 1 151-163. Univ. California Press, Berkeley.
  • Darling, R. W. R. (1994). Differential Forms and Connections. Cambridge Univ. Press.
  • Ferguson, T. S. (1982). An inconsistent maximum likelihood estimate. J. Amer. Statist. Assoc. 77 831-834.
  • Finch, S. J., Mendell, N. R. and Thode, H. C. (1989). Probability measures of adequacy of a numerical search for a global maximum. J. Amer. Statist. Assoc. 84 1020-1023.
  • Fisher, R. A. (1925). Theory of statistical estimation. Proc. Cambridge Philos. Soc. 22 700-725.
  • Gilmore, R. (1981). Catastrophe Theory for Scientists and Engineers. Dover, New York.
  • Godambe, V. P. (1960). An optimum property of regular maximum likelihood estimation. Ann. Math. Statist. 31 1208- 1212.
  • Good, I. J. (1953). The population frequencies of species and the estimation of population parameters. Biometrika 40 237- 264.
  • Greene, W. (1990). Multiple roots of the Tobit log-likelihood. J. Econometrics 46 365-380.
  • Hanfelt, J. J. and Liang, K.-Y. (1995). Approximate likelihood ratios for general estimating functions. Biometrika 82 461- 477.
  • Hanfelt, J. J. and Liang, K.-Y. (1997). Approximate likelihood for generalized linear errors-in-variables models. J. Roy. Statist. Soc. Ser. B 59 627-637.
  • Heyde, C. C. (1997). Quasi-Likelihood and Its Application. Springer, New York.
  • Heyde, C. C. and Morton, R. (1998). Multiple roots in general estimating equations. Biometrika 85 954-959.
  • Huzurbazar, V. S. (1948). The likelihood equation, consistency and the maxima of the likelihood function. Ann. Eugenics 14 185-200.
  • Iwata, S. (1993). A note on multiple roots of the Tobit log likelihood. J. Econometrics 56 441-445.
  • Jensen, J. L. and Wood, A. T. A. (1999). Large deviation results for minimum contrast estimators. Ann. Inst. Statist. Math. To appear.
  • Kalbfleisch, J. D. and Sprott, D. A. (1970). Applications of likelihood methods to models involving large numbers of parameters (with discussion). J. Roy. Statist. Soc. Ser. B 32 175- 208.
  • Kraft, C. H. and Le Cam, L. M. (1956). A remark on the roots of the maximum likelihood equation. Ann. Math. Statist. 27 1174-1177.
  • Le Cam, L. (1979). Maximum Likelihood: An Introduction. Lecture Notes in Statist. 18. Springer, Berlin.
  • Le Cam, L. (1990). Maximum likelihood: an introduction. Internat. Statist. Rev. 58 153-171.
  • Lehmann, E. L. (1983). Theory of Point Estimation. Wiley, New York.
  • Li, B. (1993). A deviance function for the quasi-likelihood method. Biometrika 80 741-753.
  • Li, B. and McCullagh, P. (1994). Potential functions and conservative estimating functions. Ann. Statist. 22 340-356.
  • Lubischew, A. (1962). On the use of discriminant functions in taxonomy. Biometrics 18 455-477. Markatou, M. (1999a). Mixture models, robustness and the weighted likelihood methodology. Technical report, Dept. Statistics, Stanford Univ. Markatou, M. (1999b). Model selection based on disparity measures with applications to mixture models. Technical report, Dept. Statistics, Columbia Univ.
  • Markatou, M., Basu, A. and Lindsay, B. G. (1998). Weighted likelihood equations with bootstrap root search. J. Amer. Statist. Assoc. 93 740-750.
  • McCullagh, P. (1991). Quasi-likelihood and estimating functions. In Statistical Theory and Modelling, in Honour of Sir David Cox (D. V. Hinkley, N. Reid and E. J. Snell, eds.) 267- 286. Chapman and Hall, London.
  • McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, 2nd ed. Chapman and Hall, London.
  • McLeish, D. L. and Small, C. G. (1988). The Theory and Applications of Statistical Inference Functions. Lecture Notes in Statist. 44. Springer, New York.
  • McLeish, D. L. and Small, C. G. (1992). A projected likelihood function for semiparametric models. Biometrika 79 93-102.
  • Neyman, J. and Scott, E. L. (1948). Consistent estimates based on partially consistent observations. Econometrica 16 1-32.
  • Olsen, R. (1978). Note on the uniqueness of the maximum likelihood estimator of the Tobit model. Econometrica 46 1211- 1215.
  • Orme, C. (1990). On the uniqueness of the maximum likelihood estimator in truncated regression models. Econom. Rev. 8 217-222.
  • Perlman, M. D. (1983). The limiting behavior of multiple roots of the likelihood equation. In Recent Advances in Statistics: Papers in Honor of Herman Chernoff on His Sixtieth Birthday (M. Rizvi, J. Rustagi and D. Siegmund, eds.) 339-370. Academic Press, New York.
  • Pratt, J. W. (1981). Concavity of the log likelihood. J. Amer. Statist. Assoc. 76 103-106.
  • Rao, C. R. (1973). Linear Statistical Inference and Its Applications, 2nd ed. Wiley, New York.
  • Reeds, J. A. (1985). Asymptotic number of roots of Cauchy location likelihood equations. Ann. Statist. 13 775-784.
  • Severini, T. A. (1998). Likelihood functions for inference in the presence of a nuisance parameter. Biometrika 85 507-522.
  • Singh, A. C. and Mantel, H. J. (1998). Minimum chi-square estimating function and the problem of choosing among multiple roots. Proc. Amer. Statist. Assoc. To appear.
  • Skovgaard, I. M. (1990). On the density of minimum contrast estimators. Ann. Statist. 18 779-789.
  • Small, C. G. and McLeish, D. L. (1994). Hilbert Space Methods in Probability and Statistical Inference. Wiley, New York.
  • Small, C. G. and Yang, Z. (1999). Multiple roots of estimating functions. Canad. J. Statist. 27 585-598.
  • Stefanski, L. A. and Carroll, R. J. (1987). Conditional scores and optimal scores for generalized linear measurement-error models. Biometrika 74 703-716.
  • Stuart, A. and Ord, J. K. (1991). Kendall's Advanced Theory of Statistics 2. Classical Inference and Relationship. Arnold, London.
  • Tzavelas, G. (1998). A note on the uniqueness of the quasilikelihood estimator. Statist. Probab. Lett. 38 125-130.
  • Wald, A. (1949). Note on the consistency of the maximum likelihood estimate. Ann. Math. Statist. 20 595-601.
  • Wang, J. (1999). Nonconservative estimating functions and approximate quasi-likelihoods. Ann. Inst. Statist. Math. 51 603-619.
  • Wedderburn, R. W. M. (1974). Quasi-likelihood functions, generalized linear models, and Gauss-Newton method. Biometrika 61 439-447.
  • Wedderburn, R. W. M. (1976). On the existence and uniqueness of the maximum likelihood estimates for generalized linear models. Biometrika 63 27-32.
  • Baggerly, K. A. (1998). Empirical likelihood as a goodness-of-fit measure. Biometrika 85 535-547.
  • Barendregt, L. G. and van Pul, M. C. (1995). On the estimation of the parameters for the Littlewood model in software reliability. Statist. Neerlandica 49 165-184.
  • Everitt, B. S. and Hand, D. J. (1981). Finite Mixture Distributions. Chapman and Hall, London.
  • Li, B. (1996). A minimax approach to consistency and efficiency for estimating equations. Ann. Statist. 24 1283-1297.
  • Li, B. (1997). On the consistency of generalized estimating equations. In Selected Proceedings of the Symposium on Estimating Functions (I. V. Basawa, V. P. Godambe and R. L. Taylor, eds.) 115-136.
  • Lindsay, B. G. (1983). The geometry of mixture likelihoods: a general theory. Ann. Statist. 11 86-94.
  • McCullagh, P. (1996). Tensor Method in Statistics. Chapman and Hall, New York.
  • Mykland, P. A. (1995). Dual likelihood. Ann. Statist. 23 386-421. Nestadt, G., Hanfelt, J., Liang, K.-Y., Lamacz, M., Wolyniec,
  • P. and Pulver, A. E. (1994). An evaluation of the structure of schizophrenia spectrum personality disorders. J. Personality Disorders 8 288-298.
  • Poston, T. and Stewart, I. N. (1976). Taylor Expansions and Catastrophe. Pitman, London.
  • Poston, T. and Stewart, I. N. (1978). Catastrophe Theory and Its Applications. Dover, New York.
  • Qian, G., Gabor, G. and Gupta, R. P. (1996). Generalised linear model selection by the predictive least quasi-deviance criterion. Biometrika 83 41-54.