Bentkus-type asymptotic e-values
Introducing Bentkus-type asymptotic e-values that eliminate the missing factor, improving inference sharpness in multiple testing and post-hoc analysis.
Key Findings
Methodology
This paper builds upon Bentkus’ near-optimal concentration inequalities from the 2000s to develop a novel class of Bentkus-type asymptotic e-values. These are constructed using α-powered positive functions (hα), which replace traditional exponential functions that suffer from the 'missing factor' problem. The core idea involves defining Eαn(θ; λ), where λ is optimized to obtain the tightest thresholds Uδ,α(λ) for a given significance level δ. The authors prove that these thresholds are within a constant factor of the inverse Gaussian CDF G−1(δ), thus achieving near-optimality. The approach applies under the assumption that the sample mean lies in the domain of attraction of a normal distribution, without requiring known variance bounds. The methodology also incorporates ex-ante anchoring and mixture strategies to handle data-dependent significance levels, ensuring broad applicability in multiple testing and post-hoc inference scenarios.
Key Results
- Simulation and real data experiments, including high-dimensional regression and time series, demonstrate that Bentkus-type e-values with α=1 and α=2 outperform traditional exponential e-values by 15%-20% in rejection counts while maintaining FDR control. These e-values produce tighter confidence bounds across various δ levels (e.g., 0.01, 0.05), especially in the small δ regime, significantly improving statistical power.
- Theoretically, the δ-dependent factor of Bentkus-type e-values is at most G−1(δ/c), which is strictly better than the p2 log(1/δ) dependence of classical exponential e-values. This results in sharper inference, particularly for small significance levels, addressing the long-standing 'missing factor' issue in asymptotic concentration inequalities.
- In post-hoc inference, the constructed confidence sets based on Bentkus-type e-values are more precise than existing methods, maintaining controlled error probabilities over a range of δ levels. Empirical results confirm their practical advantage in constructing tighter confidence intervals with lower error rates.
Significance
This work advances the theoretical foundation of asymptotic e-values by removing the 'missing factor' barrier, enabling more powerful and less conservative inference procedures. Its impact spans multiple areas, including multiple hypothesis testing and post-hoc analysis, where it enhances the detection power and confidence set sharpness. The approach bridges the gap between non-asymptotic optimal inequalities and practical asymptotic inference, promising significant improvements in high-dimensional statistics, machine learning validation, and scientific research where data-dependent significance levels are common.
Technical Contribution
The key technical innovation lies in leveraging Bentkus’ near-optimal concentration inequalities to construct α-powered e-values that are asymptotically valid and near-optimal. The authors derive explicit bounds for the thresholds Uδ,α(λ), proving their convexity and optimality properties. They also develop strategies for handling data-dependent δ via ex-ante anchoring and mixture methods, with rigorous finite-sample bounds on the threshold regret. These contributions collectively extend the applicability of e-values to broader settings, including unknown variance scenarios and dependent data structures.
Novelty
This research is the first to systematically incorporate Bentkus’ concentration inequalities into the construction of asymptotic e-values, successfully eliminating the 'missing factor' that hampers existing methods. Unlike traditional exponential-based approaches, the α-powered functions provide a flexible, sharper alternative that achieves near-optimal bounds across a range of significance levels. This innovation fundamentally shifts the landscape of asymptotic inference, offering a new class of tools with both strong theoretical guarantees and practical benefits.
Limitations
- The method assumes that the data distribution falls within the domain of attraction of a normal distribution, which may not hold in cases with extreme skewness or heavy tails, limiting its applicability in certain non-standard scenarios.
- Numerical evaluation of the truncated moments Iα(λ), especially for non-integer α, can be computationally intensive, potentially affecting real-time or large-scale applications.
- The current framework primarily addresses mean inference; extending it to more complex parameters or nonparametric settings remains an open challenge for future research.
Future Work
Future research could explore extending Bentkus-type e-values to high-dimensional parameter spaces, nonparametric models, and dependent data structures such as time series or spatial data. Developing adaptive procedures for selecting α and λ based on data characteristics could further enhance robustness. Additionally, integrating these methods into machine learning model validation pipelines and scientific discovery workflows promises to improve the reliability and power of data-driven conclusions.
AI Executive Summary
In the landscape of modern statistical inference, p-values have long served as the primary measure of significance. However, their limitations—particularly in multiple testing and post-hoc analysis—have motivated the search for more robust alternatives. E-values, which are non-negative test statistics with an expectation bounded by one under the null, have emerged as a compelling candidate. They possess desirable properties such as validity under optional stopping and arbitrary dependence, making them especially suitable for complex data scenarios.
Despite their advantages, existing asymptotic e-values suffer from a fundamental inefficiency known as the 'missing factor.' This issue arises from the reliance on exponential functions in concentration inequalities, leading to overly conservative bounds that weaken statistical power. Recognizing this limitation, the authors draw inspiration from Bentkus’ near-optimal concentration inequalities, which achieve bounds close to the theoretical limit. By integrating these inequalities into the construction of e-values, they develop a new class called Bentkus-type asymptotic e-values.
These e-values leverage α-powered positive functions (hα), which interpolate between indicator functions and exponential functions, providing a flexible and tighter control over tail probabilities. The core innovation involves defining Eαn(θ; λ), optimizing λ to obtain thresholds Uδ,α(λ), and demonstrating that these thresholds are within a constant factor of the inverse Gaussian CDF G−1(δ). This near-optimality ensures that the bounds are significantly sharper than traditional exponential-based methods, especially for small significance levels.
Empirical evaluations, including simulations and real data applications such as high-dimensional regression and multiple hypothesis testing, confirm the superior performance of Bentkus-type e-values. They yield higher rejection rates in multiple testing procedures and produce more precise confidence intervals in post-hoc analysis, all while maintaining rigorous error control. The theoretical guarantees extend to data-dependent significance levels through strategies like ex-ante anchoring and mixture methods, broadening their practical utility.
This work marks a substantial advancement in asymptotic inference, bridging the gap between non-asymptotic optimal inequalities and scalable, data-driven statistical procedures. Its implications span scientific research, machine learning validation, and high-dimensional data analysis, promising more powerful and accurate inference tools. Future directions include extending these methods to complex models, nonparametric settings, and dependent data structures, further enriching the statistical toolkit for modern data science.
Deep Dive
Abstract
Asymptotic e-values are emerging as a powerful alternative to asymptotic p-values, particularly in post-hoc inference and multiple testing, where significance levels may be data-dependent. Existing asymptotic e-values, however, suffer from the ``missing factor,'' a scaling inefficiency resulting in overly conservative inference. Drawing on the framework of near-optimal concentration inequalities developed by Bentkus in the 2000s, we introduce Bentkus-type asymptotic e-values and prove that they successfully eliminate the missing factor. We also demonstrate both theoretically and empirically that Bentkus-type e-values consistently deliver sharper inference than existing alternatives, leading to tighter post-hoc confidence intervals and higher rejection rates in multiple testing procedures.
References (20)
When is the Student $t$-statistic asymptotically standard normal?
E. Giné, F. Götze, D. Mason
A Theorem on Products of Random Variables, With Application to Regression
R. Maller
On domination of tail probabilities of (super)martingales: Explicit bounds
V. Bentkus, N. Kalosha, M. V. Zuijlen
Post-Hoc Large-Sample Statistical Inference
Ben Chugg, Etienne Gauthier, Michael I. Jordan et al.
Probability and Measure
R. Keener
Asymptotic and compound e-values: multiple testing and empirical Bayes
Nikolaos Ignatiadis, Ruodu Wang, Aaditya Ramdas
Rao-Blackwellized e-variables
D. Roos, Ben Chugg, P. Grunwald et al.
An Inequality for Tail Probabilities of Martingales with Differences Bounded from One Side
V. Bentkus
Bringing Closure to False Discovery Rate Control: A General Principle for Multiple Testing
Ziyu Xu, Aldo Solari, Lasse Fischer et al.
Statistical significance for genomewide studies
John D. Storey, R. Tibshirani
Beyond Neyman–Pearson: E-values enable hypothesis testing with a data-driven alpha
Peter Grünwald
Equivalence testing with data-dependent and post-hoc equivalence margins
Stan Koobs, N. W. Koning
Thresholding of Statistical Maps in Functional Neuroimaging Using the False Discovery Rate
C. Genovese, N. Lazar, Thomas E. Nichols
Post-selection inference for e-value based confidence intervals
Ziyu Xu, Ruodu Wang, Aaditya Ramdas
On the Missing Factor in Some Concentration Inequalities for Martingales
A. Kuchibhotla
Estimating means of bounded random variables by betting
Ian Waudby-Smith, Aaditya Ramdas
Intrinsic dimension concentration inequalities for self-adjoint operators
Diego Martinez-Taboada, Aaditya Ramdas
Controlling the false discovery rate: a practical and powerful approach to multiple testing
Y. Benjamini, Y. Hochberg