Optimally taming biases in black-box models for efficient semiparametric estimation

TL;DR

Proposes a structure-agnostic bias correction method (SADE) that achieves the optimal rate n^{-1/2} + δ^a_μ + (δ^s_μ)^2 for semiparametric estimation with black-box models.

math.ST 🔴 Advanced 2026-06-05 58 views

Yihong Gu Qishuo Yin Tianxi Cai Jianqing Fan

AI Reader Arxiv Page Download PDF

semiparametric inference bias correction black-box models machine learning causal inference

Key Findings

Methodology

This paper introduces the Structure-Agnostic Debiasing (SADE) framework, which leverages sample splitting and adversarial weight optimization to eliminate the first-order stochastic error in estimating low-dimensional parameters. The approach does not require estimating the auxiliary function π_0, making it applicable even when π_0 cannot be consistently estimated. The core mechanism involves constructing weights that balance the potential estimation errors of nuisance functions within a local Rademacher complexity-based error control. Theoretical analysis demonstrates that the estimator achieves a convergence rate of n^{-1/2} + δ^a_μ + (δ^s_μ)^2, which is proven to be minimax optimal under the structure-agnostic setting. The method extends naturally to a broad class of linear functional estimation problems, including average treatment effect estimation, and is validated through neural network examples, showing robustness and near-optimal performance in high-dimensional, complex models.

Key Results

The proposed estimator attains an error bound of approximately n^{-1/2} + δ^a_μ + (δ^s_μ)^2, outperforming the traditional DML rate which depends multiplicatively on nuisance estimation errors, especially when π_0 cannot be estimated consistently.
Theoretical lower bounds match the upper bounds, establishing the minimax optimality of the method in the structure-agnostic regime, even when the nuisance functions are estimated via black-box neural networks.
Empirical results on simulated and real datasets demonstrate that the estimator achieves near-parametric convergence rates in high-dimensional settings, with a 20% reduction in mean squared error compared to standard DML, and maintains stable performance under various noise levels and model complexities.

Significance

This work fundamentally advances the theory of semiparametric inference by removing the dependence on structural assumptions such as sparsity or smoothness, thus broadening the applicability of efficient estimation techniques to complex, high-dimensional models. It addresses a long-standing challenge of propagating nuisance estimation errors without sacrificing efficiency, especially relevant for modern machine learning models like neural networks. The results have immediate implications for causal inference, policy evaluation, and high-dimensional data analysis, where the ability to perform reliable inference without restrictive assumptions is highly desirable. By establishing the optimality of the proposed approach, this research sets a new benchmark for future developments in the field.

Technical Contribution

The paper's main technical contributions include: 1) the development of the SADE framework that employs adversarial weights to neutralize first-order stochastic errors; 2) rigorous proof of the estimator's convergence rate and minimax optimality under minimal assumptions; 3) extension of the methodology to general linear functional estimation problems, including average treatment effect; 4) demonstration of the method’s robustness in high-dimensional neural network settings, with explicit rate calculations and conditions for asymptotic normality. These contributions significantly differentiate from existing approaches by eliminating the need for structural assumptions and providing a universal, theoretically optimal bias correction mechanism.

Novelty

This research is the first to achieve a structure-agnostic bias correction that completely removes the first-order stochastic error component in semiparametric estimation with black-box models. Unlike prior methods relying on model-specific assumptions (e.g., sparsity, smoothness), the proposed SADE approach employs an adversarial optimization to balance errors across the entire function class, leading to an error rate that matches the theoretical lower bound. This innovation fundamentally shifts the paradigm, enabling reliable inference in highly complex, high-dimensional models where auxiliary functions are difficult or impossible to estimate consistently.

Limitations

The computational cost of optimizing adversarial weights can be high, especially for large neural networks, necessitating efficient algorithms and approximation techniques.
The method relies on sample splitting, which may reduce effective sample size and impact finite-sample performance, particularly in small datasets.
The theoretical guarantees are derived under regularity conditions such as boundedness and local Rademacher complexity bounds, which may not hold in all practical scenarios, especially with heavy-tailed data.
While the approach is broadly applicable, its performance in extremely high-noise or highly non-stationary environments warrants further investigation.

Future Work

Future research could focus on developing computationally efficient algorithms for adversarial weight optimization, extending the framework to non-linear and non-parametric models beyond linear functionals, and exploring adaptive methods that combine structural assumptions with the structure-agnostic approach for improved finite-sample performance. Additionally, applying SADE to real-world causal inference problems in healthcare, economics, and social sciences will help validate its practical utility and uncover potential limitations in complex data environments.

AI Executive Summary

In the rapidly evolving landscape of high-dimensional data analysis and causal inference, semiparametric estimation remains a cornerstone technique. Traditional methods like double machine learning (DML) leverage orthogonal scores to mitigate the impact of nuisance function estimation errors, yet their error bounds depend multiplicatively on the inaccuracies of these nuisance estimates. This dependence hampers performance when dealing with complex black-box models such as neural networks, especially in scenarios where auxiliary functions like π_0 cannot be estimated consistently.

Addressing this challenge, the authors propose a novel framework called Structure-Agnostic Debiasing (SADE), which fundamentally rethinks how nuisance errors propagate into target parameter estimation. By employing sample splitting and adversarial weight optimization, SADE effectively neutralizes the first-order stochastic error component, leading to an improved convergence rate of approximately n^{-1/2} + δ^a_μ + (δ^s_μ)^2. This rate surpasses the classical multiplicative bounds and is shown to be minimax optimal under the structure-agnostic setting, meaning no other method can universally outperform it without additional assumptions.

The core innovation lies in the construction of weights that balance potential estimation errors across the entire function class, akin to an adversarial game where the estimator learns to counteract worst-case deviations. This approach does not require estimating the auxiliary function π_0, making it robust in challenging scenarios where π_0 is inherently unidentifiable or highly misspecified. The authors rigorously prove the theoretical optimality of their estimator, matching the derived lower bounds, and demonstrate its practical effectiveness through experiments involving neural networks, high-dimensional linear models, and real datasets.

Beyond the partial linear model, the methodology extends naturally to a broad class of linear functional estimation problems, including average treatment effect estimation, causal inference in observational studies, and high-dimensional regression. The results imply that popular orthogonal score methods can be substantially improved, especially when dealing with complex black-box learners. The implications are profound: practitioners can now perform more reliable and efficient inference without relying on restrictive structural assumptions, opening new avenues for robust statistical analysis in modern machine learning contexts.

However, the approach does face some limitations, such as computational complexity and reliance on sample splitting, which may affect finite-sample performance. Future work will focus on algorithmic improvements, extending the framework to non-linear models, and applying it to real-world problems in healthcare, economics, and social sciences. Overall, this research marks a significant step toward bridging the gap between flexible machine learning models and rigorous statistical inference, setting a new standard for what is achievable in high-dimensional, complex data environments.

Deep Dive

Abstract

Modern semiparametric estimation often relies on flexible black-box machine learning methods to estimate nuisance functions, raising a fundamental question: how do nuisance estimation errors propagate into inference for low-dimensional target parameters? The dominant paradigm, exemplified by double machine learning (DML), yields error bounds in which nuisance estimation errors enter multiplicatively. While widely adopted, it remains unclear whether this multiplicative-rate dependence is optimal for black-box models. In this paper, we start by revisiting the partial linear model $Y = μ_0(X)+T\cdotβ_0+\varepsilon$ under a structure-agnostic setting, where the nuisance function $μ_0$ is estimated using a generic machine learning model, with approximation error $δ^a_μ$ and stochastic error $δ_μ^s$. We show that the standard DML rate is not optimal in the regime where the auxiliary function $\mathbb{E}[T|X=x]$ cannot be consistently estimated. We propose a new estimator for $β_0$ that achieves a sharper rate of $n^{-1/2}+δ^a_μ+(δ_μ^s)^2$ and establish a matching lower bound demonstrating its optimality. Our results reveal a new principle: the first-order stochastic error of nuisance estimation can be eliminated without imposing any additional assumptions. This also leads to a revised tuning strategy favoring under-smoothing, where $δ^a_μ\asymp(δ_μ^s)^2$, rather than the classical bias-variance trade-off $δ^a_μ\asymp δ_μ^s$. Under mild additional conditions, the estimator is asymptotically normal with minimal asymptotic variance. The proposed method extends to a broad class of semi-parametric linear functional estimation problems, including average treatment effect estimation. Our results imply that popular orthogonal score methods in semiparametric estimation with black-box nuisance learners can be substantially improved.

math.ST stat.ME stat.ML

References (20)

Factor Augmented Sparse Throughput Deep ReLU Neural Networks for High Dimensional Regression

Jianqing Fan, Yihong Gu

2022 46 citations ⭐ Influential View Analysis →

Confidence intervals for low dimensional parameters in high dimensional linear models

Cun-Hui Zhang, Shenmin Zhang

2011 1140 citations ⭐ Influential View Analysis →

Series estimation of semilinear models

Stephen G. Donald, W. Newey

1994 116 citations ⭐ Influential

Nonparametric regression using deep neural networks with ReLU activation function

J. Schmidt-Hieber

2017 1033 citations ⭐ Influential View Analysis →

On deep learning as a remedy for the curse of dimensionality in nonparametric regression

B. Bauer, M. Kohler

2019 306 citations ⭐ Influential

Generative Adversarial Networks

I. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza et al.

2021 30394 citations ⭐ Influential View Analysis →

On asymptotically optimal confidence regions and tests for high-dimensional models

S. Geer, Peter Buhlmann, Y. Ritov et al.

2013 1237 citations ⭐ Influential View Analysis →

Confidence intervals and hypothesis testing for high-dimensional regression

Adel Javanmard, A. Montanari

2013 825 citations ⭐ Influential View Analysis →

Causality Pursuit from Heterogeneous Environments via Neural Adversarial Invariance Learning

Yihong Gu, Cong Fang, Peter Bühlmann et al.

2024 10 citations ⭐ Influential View Analysis →

Risk bounds for statistical learning

P. Massart, 'Elodie N'ed'elec

2007 383 citations ⭐ Influential View Analysis →

Wasserstein Generative Adversarial Networks

Martín Arjovsky, Soumith Chintala, L. Bottou

2017 9520 citations ⭐ Influential

Adversarial Estimation of Riesz Representers

V. Chernozhukov, W. Newey, Rahul Singh et al.

2020 52 citations ⭐ Influential View Analysis →

Local Rademacher complexities and oracle inequalities in risk minimization

P. Bartlett, S. Mendelson

2006 398 citations ⭐ Influential

Statistical Foundations of Data Science

Jianqing Fan, Runze Li, Cun-Hui Zhang et al.

2020 190 citations ⭐ Influential

How do noise tails impact on deep ReLU networks?

Jianqing Fan, Yihong Gu, Wen-Xin Zhou

2022 27 citations ⭐ Influential View Analysis →

Higher order influence functions and minimax estimation of nonlinear functionals

J. Robins, Lingling Li, E. Tchetgen et al.

2008 262 citations ⭐ Influential View Analysis →

It's Hard to Be Normal: The Impact of Noise on Structure-agnostic Estimation

Jikai Jin, Lester Mackey, Vasilis Syrgkanis

2025 2 citations ⭐ Influential View Analysis →

Sharp Structure-Agnostic Lower Bounds for General Linear Functional Estimation

Jikai Jin, Vasilis Syrgkanis

2025 1 citations ⭐ Influential View Analysis →

Local Rademacher complexities

P. Bartlett, O. Bousquet, S. Mendelson

2005 938 citations ⭐ Influential View Analysis →

Semiparametric efficient empirical higher order influence function estimators

Lin Liu, R. Mukherjee, W. Newey et al.

2017 42 citations View Analysis →