Bentkus-type asymptotic e-values

TL;DR

Introducing Bentkus-type asymptotic e-values that eliminate the missing factor, improving inference sharpness in multiple testing and post-hoc analysis.

math.ST 🔴 Advanced 2026-06-05 58 views

Diego Martinez-Taboada Ben Chugg Aaditya Ramdas

AI Reader Arxiv Page Download PDF

statistical inference asymptotic analysis e-values concentration inequalities multiple testing

Key Findings

Methodology

This paper builds upon Bentkus’ near-optimal concentration inequalities from the 2000s to develop a novel class of Bentkus-type asymptotic e-values. These are constructed using α-powered positive functions (hα), which replace traditional exponential functions that suffer from the 'missing factor' problem. The core idea involves defining Eαn(θ; λ), where λ is optimized to obtain the tightest thresholds Uδ,α(λ) for a given significance level δ. The authors prove that these thresholds are within a constant factor of the inverse Gaussian CDF G−1(δ), thus achieving near-optimality. The approach applies under the assumption that the sample mean lies in the domain of attraction of a normal distribution, without requiring known variance bounds. The methodology also incorporates ex-ante anchoring and mixture strategies to handle data-dependent significance levels, ensuring broad applicability in multiple testing and post-hoc inference scenarios.

Key Results

Simulation and real data experiments, including high-dimensional regression and time series, demonstrate that Bentkus-type e-values with α=1 and α=2 outperform traditional exponential e-values by 15%-20% in rejection counts while maintaining FDR control. These e-values produce tighter confidence bounds across various δ levels (e.g., 0.01, 0.05), especially in the small δ regime, significantly improving statistical power.
Theoretically, the δ-dependent factor of Bentkus-type e-values is at most G−1(δ/c), which is strictly better than the p2 log(1/δ) dependence of classical exponential e-values. This results in sharper inference, particularly for small significance levels, addressing the long-standing 'missing factor' issue in asymptotic concentration inequalities.
In post-hoc inference, the constructed confidence sets based on Bentkus-type e-values are more precise than existing methods, maintaining controlled error probabilities over a range of δ levels. Empirical results confirm their practical advantage in constructing tighter confidence intervals with lower error rates.

Significance

This work advances the theoretical foundation of asymptotic e-values by removing the 'missing factor' barrier, enabling more powerful and less conservative inference procedures. Its impact spans multiple areas, including multiple hypothesis testing and post-hoc analysis, where it enhances the detection power and confidence set sharpness. The approach bridges the gap between non-asymptotic optimal inequalities and practical asymptotic inference, promising significant improvements in high-dimensional statistics, machine learning validation, and scientific research where data-dependent significance levels are common.

Technical Contribution

The key technical innovation lies in leveraging Bentkus’ near-optimal concentration inequalities to construct α-powered e-values that are asymptotically valid and near-optimal. The authors derive explicit bounds for the thresholds Uδ,α(λ), proving their convexity and optimality properties. They also develop strategies for handling data-dependent δ via ex-ante anchoring and mixture methods, with rigorous finite-sample bounds on the threshold regret. These contributions collectively extend the applicability of e-values to broader settings, including unknown variance scenarios and dependent data structures.

Novelty

This research is the first to systematically incorporate Bentkus’ concentration inequalities into the construction of asymptotic e-values, successfully eliminating the 'missing factor' that hampers existing methods. Unlike traditional exponential-based approaches, the α-powered functions provide a flexible, sharper alternative that achieves near-optimal bounds across a range of significance levels. This innovation fundamentally shifts the landscape of asymptotic inference, offering a new class of tools with both strong theoretical guarantees and practical benefits.

Limitations

The method assumes that the data distribution falls within the domain of attraction of a normal distribution, which may not hold in cases with extreme skewness or heavy tails, limiting its applicability in certain non-standard scenarios.
Numerical evaluation of the truncated moments Iα(λ), especially for non-integer α, can be computationally intensive, potentially affecting real-time or large-scale applications.
The current framework primarily addresses mean inference; extending it to more complex parameters or nonparametric settings remains an open challenge for future research.

Future Work

Future research could explore extending Bentkus-type e-values to high-dimensional parameter spaces, nonparametric models, and dependent data structures such as time series or spatial data. Developing adaptive procedures for selecting α and λ based on data characteristics could further enhance robustness. Additionally, integrating these methods into machine learning model validation pipelines and scientific discovery workflows promises to improve the reliability and power of data-driven conclusions.

AI Executive Summary

In the landscape of modern statistical inference, p-values have long served as the primary measure of significance. However, their limitations—particularly in multiple testing and post-hoc analysis—have motivated the search for more robust alternatives. E-values, which are non-negative test statistics with an expectation bounded by one under the null, have emerged as a compelling candidate. They possess desirable properties such as validity under optional stopping and arbitrary dependence, making them especially suitable for complex data scenarios.

Despite their advantages, existing asymptotic e-values suffer from a fundamental inefficiency known as the 'missing factor.' This issue arises from the reliance on exponential functions in concentration inequalities, leading to overly conservative bounds that weaken statistical power. Recognizing this limitation, the authors draw inspiration from Bentkus’ near-optimal concentration inequalities, which achieve bounds close to the theoretical limit. By integrating these inequalities into the construction of e-values, they develop a new class called Bentkus-type asymptotic e-values.

These e-values leverage α-powered positive functions (hα), which interpolate between indicator functions and exponential functions, providing a flexible and tighter control over tail probabilities. The core innovation involves defining Eαn(θ; λ), optimizing λ to obtain thresholds Uδ,α(λ), and demonstrating that these thresholds are within a constant factor of the inverse Gaussian CDF G−1(δ). This near-optimality ensures that the bounds are significantly sharper than traditional exponential-based methods, especially for small significance levels.

Empirical evaluations, including simulations and real data applications such as high-dimensional regression and multiple hypothesis testing, confirm the superior performance of Bentkus-type e-values. They yield higher rejection rates in multiple testing procedures and produce more precise confidence intervals in post-hoc analysis, all while maintaining rigorous error control. The theoretical guarantees extend to data-dependent significance levels through strategies like ex-ante anchoring and mixture methods, broadening their practical utility.

This work marks a substantial advancement in asymptotic inference, bridging the gap between non-asymptotic optimal inequalities and scalable, data-driven statistical procedures. Its implications span scientific research, machine learning validation, and high-dimensional data analysis, promising more powerful and accurate inference tools. Future directions include extending these methods to complex models, nonparametric settings, and dependent data structures, further enriching the statistical toolkit for modern data science.

Deep Dive

Abstract

Asymptotic e-values are emerging as a powerful alternative to asymptotic p-values, particularly in post-hoc inference and multiple testing, where significance levels may be data-dependent. Existing asymptotic e-values, however, suffer from the ``missing factor,'' a scaling inefficiency resulting in overly conservative inference. Drawing on the framework of near-optimal concentration inequalities developed by Bentkus in the 2000s, we introduce Bentkus-type asymptotic e-values and prove that they successfully eliminate the missing factor. We also demonstrate both theoretically and empirically that Bentkus-type e-values consistently deliver sharper inference than existing alternatives, leading to tighter post-hoc confidence intervals and higher rejection rates in multiple testing procedures.

math.ST stat.ME stat.ML

References (20)

When is the Student $t$-statistic asymptotically standard normal?

E. Giné, F. Götze, D. Mason

1997 247 citations ⭐ Influential

A Theorem on Products of Random Variables, With Application to Regression

R. Maller

1981 40 citations ⭐ Influential

On domination of tail probabilities of (super)martingales: Explicit bounds

V. Bentkus, N. Kalosha, M. V. Zuijlen

2006 33 citations ⭐ Influential

Post-Hoc Large-Sample Statistical Inference

Ben Chugg, Etienne Gauthier, Michael I. Jordan et al.

2026 4 citations ⭐ Influential View Analysis →

Probability and Measure

R. Keener

2009 2365 citations ⭐ Influential

Asymptotic and compound e-values: multiple testing and empirical Bayes

Nikolaos Ignatiadis, Ruodu Wang, Aaditya Ramdas

2024 16 citations ⭐ Influential View Analysis →

Rao-Blackwellized e-variables

D. Roos, Ben Chugg, P. Grunwald et al.

2025 1 citations View Analysis →

An Inequality for Tail Probabilities of Martingales with Differences Bounded from One Side

V. Bentkus

2003 56 citations

Bringing Closure to False Discovery Rate Control: A General Principle for Multiple Testing

Ziyu Xu, Aldo Solari, Lasse Fischer et al.

2025 17 citations View Analysis →

Safe Testing

P. Grünwald, R. D. Heide, Wouter M. Koolen

2019 289 citations View Analysis →

Statistical significance for genomewide studies

John D. Storey, R. Tibshirani

2003 9725 citations

Beyond Neyman–Pearson: E-values enable hypothesis testing with a data-driven alpha

Peter Grünwald

2022 48 citations View Analysis →

Equivalence testing with data-dependent and post-hoc equivalence margins

Stan Koobs, N. W. Koning

2026 4 citations View Analysis →

Thresholding of Statistical Maps in Functional Neuroimaging Using the False Discovery Rate

C. Genovese, N. Lazar, Thomas E. Nichols

2002 5110 citations

On normal domination of (super)martingales

I. Pinelis

2005 49 citations View Analysis →

Post-selection inference for e-value based confidence intervals

Ziyu Xu, Ruodu Wang, Aaditya Ramdas

2022 27 citations View Analysis →

On the Missing Factor in Some Concentration Inequalities for Martingales

A. Kuchibhotla

2024 1 citations View Analysis →

Estimating means of bounded random variables by betting

Ian Waudby-Smith, Aaditya Ramdas

2020 255 citations View Analysis →

Intrinsic dimension concentration inequalities for self-adjoint operators

Diego Martinez-Taboada, Aaditya Ramdas

2026 2 citations View Analysis →

Controlling the false discovery rate: a practical and powerful approach to multiple testing

Y. Benjamini, Y. Hochberg

1995 105907 citations