Registered users receive a variety of benefits including the ability to customize email alerts, create favorite journals list, and save searches.
Please note that a Project Euclid web account does not automatically grant access to full-text content. An institutional or society member subscription is required to view non-Open Access content.
Contact firstname.lastname@example.org with any questions.
We provide nonasymptotic excess risk guarantees for statistical learning in a setting where the population risk with respect to which we evaluate the target parameter depends on an unknown nuisance parameter that must be estimated from data. We analyze a two-stage sample splitting meta-algorithm that takes as input arbitrary estimation algorithms for the target parameter and nuisance parameter. We show that if the population risk satisfies a condition called Neyman orthogonality, the impact of the nuisance estimation error on the excess risk bound achieved by the meta-algorithm is of second order. Our theorem is agnostic to the particular algorithms used for the target and nuisance and only makes an assumption on their individual performance. This enables the use of a plethora of existing results from machine learning to give new guarantees for learning with a nuisance component. Moreover, by focusing on excess risk rather than parameter estimation, we can provide rates under weaker assumptions than in previous works and accommodate settings in which the target parameter belongs to a complex nonparametric class. We provide conditions on the metric entropy of the nuisance and target classes such that oracle rates of the same order, as if we knew the nuisance parameter, are achieved.
We derive minimax testing errors in a distributed framework where the data is split over multiple machines and their communication to a central machine is limited to b bits. We investigate both the d- and infinite-dimensional signal detection problem under Gaussian white noise. We also derive distributed testing algorithms reaching the theoretical lower bounds.
Our results show that distributed testing is subject to fundamentally different phenomena that are not observed in distributed estimation. Among our findings we show that testing protocols that have access to shared randomness can perform strictly better in some regimes than those that do not. We also observe that consistent nonparametric distributed testing is always possible, even with as little as one bit of communication, and the corresponding test outperforms the best local test using only the information available at a single local machine. Furthermore, we also derive adaptive nonparametric distributed testing strategies and the corresponding theoretical lower bounds.
Motivated by crowdsourcing applications, we consider a model where we have partial observations from a bivariate isotonic matrix with an unknown permutation acting on its rows. Focusing on the twin problems of recovering the permutation and estimating the unknown matrix, we introduce a polynomial-time procedure achieving the minimax risk for these two problems, this for all possible values of n, d, and all possible sampling efforts. Along the way we establish that, in some regimes, recovering the unknown permutation is considerably simpler than estimating the matrix.
Positive dependence is present in many real world data sets and has appealing stochastic properties that can be exploited in statistical modeling and in estimation. In particular, the notion of multivariate total positivity of order 2 () is a convex constraint and acts as an implicit regularizer in the Gaussian case. We study positive dependence in multivariate extremes and introduce , an extremal version of . This notion turns out to appear prominently in extremes, and in fact, it is satisfied by many classical models. For a Hüsler–Reiss distribution, the analogue of a Gaussian distribution in extremes, we show that it is if and only if its precision matrix is a Laplacian of a connected graph. We propose an estimator for the parameters of the Hüsler–Reiss distribution under as the solution of a convex optimization problem with Laplacian constraint. We prove that this estimator is consistent and typically yields a sparse model with possibly nondecomposable extremal graphical structure. Applying our methods to a data set of Danube River flows, we illustrate this regularization and the superior performance compared to existing methods.
Variable selection properties of procedures utilizing penalized-likelihood estimates is a central topic in the study of high-dimensional linear regression problems. Existing literature emphasizes the quality of ranking of the variables by such procedures as reflected in the receiver operating characteristic curve or in prediction performance. Specifically, recent works have harnessed modern theory of approximate message-passing (AMP) to obtain, in a particular setting, exact asymptotic predictions of the type I, type II error tradeoff for selection procedures that rely on -regularized estimators.
In practice, effective ranking by itself is often not sufficient because some calibration for Type I error is required. In this work, we study theoretically the power of selection procedures that similarly rank the features by the size of an -regularized estimator, but further use Model-X knockoffs to control the false discovery rate in the realistic situation where no prior information about the signal is available. In analyzing the power of the resulting procedure, we extend existing results in AMP theory to handle the pairing between original variables and their knockoffs. This is used to derive exact asymptotic predictions for power. We apply the general results to compare the power of the knockoffs versions of Lasso and thresholded-Lasso selection, and demonstrate that in the i.i.d. covariate setting under consideration, tuning by cross-validation on the augmented design matrix is nearly optimal. We further demonstrate how the techniques allow to analyze also the Type S error, and a corresponding notion of power, when selections are supplemented with a decision on the sign of the coefficient.
Tie-breaker designs trade off a measure of statistical efficiency against a short-term gain from preferentially assigning a binary treatment to subjects with higher values of a running variable x. The efficiency measure can be any continuous function of the expected information matrix in a two-line regression model. The short-term gain is expressed as the covariance between the running variable and the treatment indicator. We investigate how to choose design functions specifying the probability of treating a subject with running variable x in order to optimize these competing objectives, under external constraints on the number of subjects receiving treatment. Our results include sharp existence and uniqueness guarantees, while accommodating the ethically appealing requirement that be nondecreasing in x. Under this condition, there is always an optimal treatment probability function that is constant on the sets and for some threshold t and generally discontinuous at . When the running variable distribution is not symmetric or the fraction of subjects receiving the treatment is not , our optimal designs improve upon a D-optimality objective without sacrificing short-term gain, compared to a typical three-level tie-breaker design that fixes treatment probabilities at 0, and 1. We illustrate our optimal designs with data from Head Start, an early childhood government intervention program.
Structure learning via MCMC sampling is known to be very challenging because of the enormous search space and the existence of Markov equivalent DAGs. Theoretical results on the mixing behavior are lacking. In this work, we prove the rapid mixing of a random walk Metropolis–Hastings algorithm, which reveals that the complexity of Bayesian learning of sparse equivalence classes grows only polynomially in n and p, under some high-dimensional assumptions. A series of high-dimensional consistency results is obtained, including the strong selection consistency of an empirical Bayes model for structure learning. Our proof is based on two new results. First, we derive a general mixing time bound on finite-state spaces, which can be applied to local MCMC schemes for other model selection problems. Second, we construct high-probability search paths on the space of equivalence classes with node degree constraints by proving a combinatorial property of DAG comparisons. Simulation studies on the proposed MCMC sampler are conducted to illustrate the main theoretical findings.
We study the estimation of the reach, an ubiquitous regularity parameter in manifold estimation and geometric data analysis. Given an i.i.d. sample over an unknown d-dimensional -smooth submanifold M of , we provide optimal nonasymptotic bounds for the estimation of its reach. We build upon a formulation of the reach in terms of maximal curvature on one hand and geodesic metric distortion on the other. The derived rates are adaptive, with rates depending on whether the reach of M arises from curvature or from a bottleneck structure. In the process we derive optimal geodesic metric estimation bounds.
Determining the precise rank is an important problem in many large-scale applications with matrix data exploiting low-rank plus noise models. In this paper, we suggest a universal approach to rank inference via residual subsampling (RIRS) for testing and estimating rank in a wide family of models, including many popularly used network models such as the degree corrected mixed membership model as a special case. Our procedure constructs a test statistic via subsampling entries of the residual matrix after extracting the spiked components. The test statistic converges in distribution to the standard normal under the null hypothesis, and diverges to infinity with asymptotic probability one under the alternative hypothesis. The effectiveness of RIRS procedure is justified theoretically, utilizing the asymptotic expansions of eigenvectors and eigenvalues for large random matrices recently developed in (J. Amer. Statist. Assoc.117 (2022) 996–1009) and (J. R. Stat. Soc. Ser. B. Stat. Methodol.84 (2022) 630–653). The advantages of the newly suggested procedure are demonstrated through several simulation and real data examples.
The class of Gibbs point processes (GPP) is a large class of spatial point processes able to model both clustered and repulsive point patterns. They are specified by their conditional intensity, which for a point pattern x and a location u, is roughly speaking the probability that an event occurs in an infinitesimal ball around u given the rest of the configuration is x. The most simple and natural class of models is the class of pairwise interaction point processes where the conditional intensity depends on the number of points and pairwise distances between them. This paper is concerned with the problem of estimating the pairwise interaction function nonparametrically. We propose to estimate it using an orthogonal series expansion of its logarithm. Such an approach has numerous advantages compared to existing ones. The estimation procedure is simple, fast and completely data-driven. We provide asymptotic properties such as consistency and asymptotic normality and show the efficiency of the procedure through simulation experiments and illustrate it with several data sets.
We herein establish an asymptotic representation theorem for locally asymptotically normal quantum statistical models. This theorem enables us to study the asymptotic efficiency of quantum estimators, such as quantum regular estimators and quantum minimax estimators, leading to a universal tight lower bound beyond the i.i.d. assumption. This formulation complements the theory of quantum contiguity developed in the previous paper [Fujiwara and Yamagata, Bernoulli26 (2020) 2105–2141], providing a solid foundation of the theory of weak quantum local asymptotic normality.
In this paper, we derive the limit of experiments for one-parameter Ising models on dense regular graphs. In particular, we show that the limiting experiment is Gaussian in the “low temperature” regime, and non-Gaussian in the “critical” regime. We also derive the limiting distributions of the maximum likelihood and maximum pseudolikelihood estimators, and study limiting power for tests of hypothesis against contiguous alternatives. To the best of our knowledge, this is the first attempt at establishing the classical limits of experiments for Ising models (and more generally, Markov random fields).
Understanding the time-varying structure of complex temporal systems is one of the main challenges of modern time-series analysis. In this paper, we show that every uniformly-positive-definite-in-covariance and sufficiently short-range dependent nonstationary and nonlinear time series can be well approximated globally by a white-noise-driven autoregressive (AR) process of slowly diverging order. To our best knowledge, it is the first time such a structural approximation result is established for general classes of nonstationary time series. A high-dimensional test and an associated multiplier bootstrap procedure are proposed for the inference of the AR approximation coefficients. In particular, an adaptive stability test is proposed to check whether the AR approximation coefficients are time-varying, a frequently encountered question for practitioners and researchers of time series. As an application, globally optimal sffollowing hort-term forecasting theory and methodology for a wide class of locally stationary time series are established via the method of sieves.
In high-dimensional classification problems, a commonly used approach is to first project the high-dimensional features into a lower-dimensional space, and base the classification on the resulting lower-dimensional projections. In this paper, we formulate a latent-variable model with a hidden low-dimensional structure to justify this two-step procedure and to guide which projection to choose. We propose a computationally efficient classifier that takes certain principal components (PCs) of the observed features as projections, with the number of retained PCs selected in a data-driven way. A general theory is established for analyzing such two-step classifiers based on any projections. We derive explicit rates of convergence of the excess risk of the proposed PC-based classifier. The obtained rates are further shown to be optimal up to logarithmic factors in the minimax sense. Our theory allows the lower dimension to grow with the sample size and is also valid even when the feature dimension (greatly) exceeds the sample size. Extensive simulations corroborate our theoretical findings. The proposed method also performs favorably relative to other existing discriminant methods on three real data examples.
We consider a space structured population model generated by two-point clouds: a homogeneous Poisson process M with intensity as a model for a parent generation together with a Cox point process N as offspring generation, with conditional intensity given by the convolution of M with a scaled dispersal density . Based on a realisation of M and N, we study the nonparametric estimation of f and the estimation of the physical scale parameter simultaneously for all regimes . We establish that the optimal rates of convergence do not depend monotonously on the scale and we construct minimax estimators accordingly whether σ is known or considered as a nuisance, in which case we can estimate it and achieve asymptotic minimaxity by plug-in. The statistical reconstruction exhibits a competition between a direct and a deconvolution problem. Our study reveals in particular the existence of a least favorable intermediate inference scale, a phenomenon that seems to be new.
Statistical inference from high-dimensional data with low-dimensional structures has recently attracted a lot of attention. In machine learning, deep generative modelling approaches implicitly estimate distributions of complex objects by creating new samples from the underlying distribution, and have achieved great success in generating synthetic realistic-looking images and texts. A key step in these approaches is the extraction of latent features or representations (encoding) that can be used for accurately reconstructing the original data (decoding). In other words, low-dimensional manifold structure is implicitly assumed and utilized in the distribution modelling and estimation. To understand the benefit of low-dimensional manifold structure in generative modelling, we build a general minimax framework for distribution estimation on unknown submanifold under adversarial losses, with suitable smoothness assumptions on the target distribution and the manifold. The established minimax rate elucidates how various problem characteristics, including intrinsic dimensionality of the data and smoothness levels of the target distribution and the manifold, affect the fundamental limit of high-dimensional distribution estimation. To prove the minimax upper bound, we construct an estimator based on a mixture of locally fitted generative models, which is motivated by the partition of unity technique from differential geometry and is necessary to cover cases where the underlying data manifold does not admit a global parametrization. We also propose a data-driven adaptive estimator that is shown to simultaneously attain within a logarithmic factor of the optimal rate over a large collection of distribution classes.
This paper studies inference in linear models with a high-dimensional parameter matrix that can be well approximated by a “spiked low-rank matrix.” A spiked low-rank matrix has rank that grows slowly compared to its dimensions and nonzero singular values that diverge to infinity. We show that this framework covers a broad class of models of latent variables, which can accommodate matrix completion problems, factor models, varying coefficient models and heterogeneous treatment effects. For inference, we apply a procedure that relies on an initial nuclear-norm penalized estimation step followed by two ordinary least squares regressions. We consider the framework of estimating incoherent eigenvectors and use a rotation argument to argue that the eigenspace estimation is asymptotically unbiased. Using this framework, we show that our procedure provides asymptotically normal inference and achieves the semiparametric efficiency bound. We illustrate our framework by providing low-level conditions for its application in a treatment effects context where treatment assignment might be strongly dependent.
We extend extreme value statistics to independent data with possibly very different distributions. In particular, we present novel asymptotic normality results for the Hill estimator, which now estimates the extreme value index of the average distribution. Due to the heterogeneity, the asymptotic variance can be substantially smaller than that in the i.i.d. case. As a special case, we consider a heterogeneous scales model where the asymptotic variance can be calculated explicitly. The primary tool for the proofs is the functional central limit theorem for a weighted tail empirical process. We also present asymptotic normality results for the extreme quantile estimator. A simulation study shows the good finite-sample behavior of our limit theorems. We also present applications to assess the tail heaviness of earthquake energies and of cross-sectional stock market losses.
We present bounds for the finite-sample error of sequential Monte Carlo samplers on static spaces. Our approach explicitly relates the performance of the algorithm to properties of the chosen sequence of distributions and mixing properties of the associated Markov kernels. This allows us to give the first finite-sample comparison to other Monte Carlo schemes. We obtain bounds for the complexity of sequential Monte Carlo approximations for a variety of target distributions such as finite spaces, product measures and log-concave distributions including Bayesian logistic regression. The bounds obtained are within a logarithmic factor of similar bounds obtainable for Markov chain Monte Carlo.
We consider the nonparametric multivariate isotonic regression problem, where the regression function is assumed to be nondecreasing with respect to each predictor. Our goal is to construct a Bayesian credible interval for the function value at a given interior point with assured limiting frequentist coverage. A natural prior on the regression function is given by a random step function with a suitable prior on increasing step-heights, but the resulting posterior distribution is hard to analyze theoretically due to the complicated order restriction on the coefficients. We instead put a prior on unrestricted step-functions, but make inference using the induced posterior measure by an “immersion map” from the space of unrestricted functions to that of multivariate monotone functions. This allows for maintaining the natural conjugacy for posterior sampling. A natural immersion map to use is a projection with respect to a distance function, but in the present context, a block isotonization map is found to be more useful. The approach of using the induced “immersion posterior” measure instead of the original posterior to make inference provides a useful extension of the Bayesian paradigm, particularly helpful when the model space is restricted by some complex relations. We establish a key weak convergence result for the posterior distribution of the function at a point in terms of some functional of a multiindexed Gaussian process that leads to an expression for the limiting coverage of the Bayesian credible interval. Analogous to a recent result for univariate monotone functions, we find that the limiting coverage is slightly higher than the credibility, the opposite of a phenomenon observed in smoothing problems. Interestingly, the relation between credibility and limiting coverage does not involve any unknown parameter. Hence, by a recalibration procedure, we can get a predetermined asymptotic coverage by choosing a suitable credibility level smaller than the targeted coverage, and thus also shorten the credible intervals.
We provide algorithms for regression with adversarial responses under large classes of non-i.i.d. instance sequences, on general separable metric spaces, with provably minimal assumptions. We also give characterizations of learnability in this regression context. We consider universal consistency, which asks for strong consistency of a learner without restrictions on the value responses. Our analysis shows that such an objective is achievable for a significantly larger class of instance sequences than stationary processes, and unveils a fundamental dichotomy between value spaces: whether finite-horizon mean estimation is achievable or not. We further provide optimistically universal learning rules, that is, such that if they fail to achieve universal consistency, any other algorithms will fail as well. For unbounded losses, we propose a mild integrability condition under which there exist algorithms for adversarial regression under large classes of non-i.i.d. instance sequences. In addition, our analysis also provides a learning rule for mean estimation in general metric spaces that is consistent under adversarial responses without any moment conditions on the sequence, a result of independent interest.
The asymptotic normality for a large family of eigenvalue statistics of a general sample covariance matrix is derived under the ultrahigh-dimensional setting, that is, when the dimension to sample size ratio . Based on this CLT result, we extend the covariance matrix test problem to the new ultra-high-dimensional context, and apply it to test a matrix-valued white noise. Simulation experiments are conducted for the investigation of finite-sample properties of the general asymptotic normality of eigenvalue statistics, as well as the two developed tests.
PURCHASE SINGLE ARTICLE
This article is only available to subscribers. It is not available for individual sale.