## Electronic Journal of Statistics

### Integrated Cumulative Error (ICE) distance for non-nested mixture model selection: Application to extreme values in metal fatigue problems

#### Abstract

In this paper, we consider the problem of selecting the most appropriate model, amongst a given collection of mixture models, to describe datasets likely drawn from mixture of distributions. The proposed method consists of finding the quasi-maximum likelihood estimators (QMLEs) of the various models in competition, using the Expectation-Maximization (EM) type algorithms, and subsequently estimating, for every model, a statistical distance to the true model based on the empirical cumulative distribution function (cdf) of the original dataset and the QMLE-fitted cdf. To evaluate the goodness of fit, a new metric, the Integrated Cumulative Error ($ICE$) is proposed and compared with other existing metrics for accuracy of detecting the appropriate model. We state, under mild conditions, that our estimator of the $ICE$ distance converges at the rate $\sqrt{n}$ in probability along with the consistency of our model selection procedure (ability to detect asymptotically the right model). The $ICE$ criterion shows, over a set of benchmark examples, numerically improved performance from the existing distance-based criteria in identifying the correct model. The method is applied in a material fatigue life context to model the distribution of indicators of the fatigue crack formation potency, obtained from numerical experiments.

#### Article information

Source
Electron. J. Statist., Volume 8, Number 2 (2014), 3141-3175.

Dates
First available in Project Euclid: 22 January 2015

https://projecteuclid.org/euclid.ejs/1421936897

Digital Object Identifier
doi:10.1214/15-EJS985

Mathematical Reviews number (MathSciNet)
MR3303680

Zentralblatt MATH identifier
1308.62134

Subjects
Primary: 62G05: Estimation 62G20: Asymptotic properties
Secondary: 62E10: Characterization and structure theory

#### Citation

Vandekerkhove, P.; Padbidri, J. M.; McDowell, D. L. Integrated Cumulative Error (ICE) distance for non-nested mixture model selection: Application to extreme values in metal fatigue problems. Electron. J. Statist. 8 (2014), no. 2, 3141--3175. doi:10.1214/15-EJS985. https://projecteuclid.org/euclid.ejs/1421936897

#### References

• [1] Henna, J. (1985). On estimating the number of constituents of a finite mixture of continuous distributions., Ann. Inst. Statist. Math., 37, 235–240.
• [2] Izenman, A. J. and Sommer, C. (1988). Philatelic mixtures and multivariate densities., Journal of the American Math. Soc., 83, 941–953.
• [3] Roeder, K. (1994). A graphical technique to determining the number of components in a mixture of normals., J. American Statist. Assoc., 89, 487–495.
• [4] Lindsay, B. G. (1983). Moment matrices: Application in mixtures., Ann. Statist., 17, 722–740.
• [5] Dacunha-Castelle, D. and Gassiat, E. (1999). Testing the order of a model using locally conic parametrization: Population mixtures and stationary ARMA processes., Ann. Statist., 27, 1178–1209.
• [6] Keribin, C. (2000). Consistent estimation of the order of mixture models., Sankhya Series A, 62, 49–66.
• [7] Berkhof, J., van Mechelen, I. and Gelman, A. (2003). A Bayesian approach to the selection and testing of mixture models., Statistica Sinica, 13, 423–442.
• [8] Vuong, Q. H. (1989). Likelihood ratio test for model selection and non-nested hypothesis., Econometrica, 57, 307–333.
• [9] Suresh, S. (1998)., Fatigue of Materials, 2nd ed., Cambridge University Press, Cambridge, UK.
• [10] McDowell, D. L. (1996). Basic issues in the mechanics of high cycle metal fatigue., Int. J. Frac., 80, 103–145.
• [11] Schijve, J. (2005). Statistical distribution functions and fatigue of structures., Int. J. Fat., 27, 1031–1039.
• [12] Przybyla, C. P. and McDowell, D. L. (2010). Microstructure-sensitive extreme value probabilities for high cycle fatigue of Ni-base superalloy IN100., Int. J. Plast., 26, 372–394.
• [13] Berger, C. and Kaiser, B. (2006). Results of very high cycle fatigue tests on helical compression springs., Int. J. Fat., 28, 1658–1663.
• [14] Marines, I., Bin, X. and Bathias, C. (2003). An understanding of very high cycle fatigue of metals., Int. J. Fat., 25, 1101–1107.
• [15] Miao, J., Pollock, T. M. and Jones, J. W. (2009). Crystallographic fatigue crack initiation in nickel-based superalloy René 88DT at elevated temperature., Acta Mat., 57, 5964–5974.
• [16] Jha, S. K., Caton, M. J. and Larsen, J. M. (2008). Mean vs. life-limiting fatigue behavior of a nickel-based superalloy., Superalloys 2008 – Proceedings of the 11th International Symposium on Superalloys, 565– 572.
• [17] Sakai, T., Lian, B., Takeda, M., Shiozawa, K., Oguma, N., Ochi, Y., Nakajima, M. and Nakamura, T. (2010). Statistical duplex SN characteristics of high carbon chromium bearing steel in rotating bending in very high cycle regime., Int. J. Fat., 32, 497–504.
• [18] Schijve, J. (1994). Fatigue predictions and scatter., Fatigue Fract. Enng. Mater. Struct., 17, 381–396.
• [19] Ravi Chandran, K. S., Chang, P. and Cashman, G. T. (2010). Competing failure modes and complex SN curves in fatigue of structural materials., Int. J. Fat., 32, 482–491.
• [20] Wu, C. F. (1983). On the convergence properties of the EM algorithm., Ann. Statist., 11, 95–103.
• [21] Dempster, A., Laird, N., and Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion)., J. Roy. Stat. Soc. B, 39, 1–38.
• [22] Ahmad, K. E., Jahseen, Z. F. and Modhesh, A. A. (2010). Estimation of a discriminant function based on small sample size from a mixture of two Gumbel distributions., Comm. Statist.–Simulation and Computation, 39, 713–725.
• [23] Akaike, H. (1973). Information theory and an extension of the likelihood principle., Proceedings of the second International symposium of Information Theory. Ed. Petrov, B. N. and Csáki, F., Akadémiai Kiado, Budapest.
• [24] Babu, G. J. (2011). Resampling method for model fitting and model selection., J. Biopharma. Statist., 21, 1177–1186.
• [25] McDowell, D. L. (2007). Simulation-based strategies for microstructure-sensitive fatigue modeling., Mat. Sci. Engg. A, 468–470, 4–14.
• [26] Vandermeulen, W., Scibetta, M., Leenaers, A., Schuurmans, J. and Gérard, R. (2008). Measurement of the Young modulus anisotropy of a reactor pressure vessel cladding., J. Nuc. Mat., 372, 2–3, 249–255.
• [27] Hughes, T. J. R. (2000)., The Finite Element Method: Linear Static and Dynamic Finite Element Analysis. Dover publications.
• [28] ABAQUS FEA, V6.7.1., D S Simulia, Dassault Systèmes, Providence, RI.
• [29] Mesarovic, S. Dj. and Padbidri, J. (2005). Minimal kinematic boundary conditions for simulations of disordered microstructures., Phil. Mag., 85, 65–78.
• [30] Dabrowski, A. R. (1990). Extremal point processes and intermediate quantile functions., Probab. Theory Related Fields, 85, 365–386.
• [31] Han, L. and Ferreira, A. (2006)., Extreme Value Theory. Springer, New-York.
• [32] Kullback, S. and Leibler, R. A. (1951). On Information and Sufficiency., Ann. Math. Statist., 22, 79–86.
• [33] LeCam, L. (1953). On some asymptotic properties of maximum likelihood estimates and related Bayes’ estimates., University of California Publications in Statistics, 1, 277–330.
• [34] Shorack, G. R. and Wellner, J. A. (1986)., Empirical Processes with Applications to Statistics. Wiley, New York.
• [35] Budka, M., Gabrys, B. and Musial, K. (2011)., On Accuracy of PDF Divergence Estimators and Their Applicability to Representative Data Sampling. Entropy, 13, 1229–1266.
• [36] Titterington, D. M., Smith, A. F. M. and Makov, U. E. (1985)., Statistical Analysis of Finite Mixture Distributions, Wiley, Chichester.
• [37] van der Vaart, A. W. and Wellner, J. A. (1996)., Weak Convergence and Empirical Processes: With Applications to Statistics. Springer-Verlag, New-York.
• [38] Teicher, H. (1963). Identifiability of finite mixtures., Ann. Math. Stat., 34, 1265–1269.
• [39] Wald, A. (1949). Note on the consistency of the maximum likelihood estimate., Ann. Math. Statist., 60, 595–603.
• [40] White, H. (1982). Maximum likelihood estimation of misspecified models., Econometrica, 50, 1–25.