Finite-Particle Convergence Rates for Conservative and Non-Conservative Drifting Models

TL;DR

Proposes KDE-gradient conservative drifting with finite-particle convergence rates up to N^{-(2-β)/(2(d+4-β))}

stat.ML 🔴 Advanced 2026-05-22 90 views

Krishnakumar Balasubramanian

generative modeling kernel density estimation drifting models finite-particle analysis PDE

Key Findings

Methodology

This paper introduces a conservative drifting method for one-step generative modeling by replacing the traditional displacement-based drifting velocity with a kernel density estimator (KDE) gradient velocity. Specifically, the velocity is defined as the difference between the kernel-smoothed data score and the kernel-smoothed model score, forming a gradient field that addresses the non-conservatism issue inherent in displacement-based drifting fields. Leveraging a joint-entropy identity, the authors derive continuous-time finite-particle convergence bounds on ℝ^d, encompassing empirical Stein drift, the smoothed Fisher discrepancy of the KDE, and the squared center velocity. A key finite-particle correction term is the reciprocal-KDE self-interaction, for which deterministic and high-probability local occupancy conditions are provided to ensure control. The analysis explicitly tracks quadrature constants and their bandwidth dependence, proving that under an h-uniform quadrature regularity condition, the root residual velocity rate is N^{-1/(d+4)}, while a more general growth condition yields an optimized root rate of N^{-(2-β)/(2(d+4-β))} with 0 ≤ β < 2. Additionally, the paper analyzes the non-conservative drifting method with a Laplace kernel, decomposing its velocity into a positive scalar preconditioning of a sharp-score mismatch plus a scale-mismatch residual, resulting in a finite-particle rate with an unavoidable residual term. Finally, the continuous-time residual velocity bounds are translated into one-step generation guarantees via an explicit drift size η.

Key Results

The joint-entropy identity establishes that the empirical Stein drift averaged over time is bounded by H_N(0)/(N T) + a_h Λ_T / N, providing a finite-particle convergence rate for the conservative method.
Under a bandwidth h uniform quadrature condition, the root residual velocity achieves a rate of N^{-1/(d+4)}, with a more general condition yielding an optimized rate of N^{-(2-β)/(2(d+4-β))}, where β characterizes the KDE score field regularity.
For the non-conservative Laplace kernel drifting method, the velocity decomposes into a sharp-score mismatch scaled by a positive local factor plus a scale-mismatch residual, leading to finite-particle convergence rates that include an unavoidable residual term, highlighting intrinsic limitations.

Significance

This work addresses the fundamental issue of non-conservatism in displacement-based drifting velocities within one-step generative modeling by proposing a theoretically rigorous conservative drifting method. It fills a critical gap by restoring the gradient field structure of the drifting velocity, enabling stable and mathematically tractable particle dynamics. The detailed finite-particle convergence analysis, including explicit bandwidth dependence, advances theoretical understanding and provides practical guidelines for kernel and bandwidth selection. This contributes significantly to the development of efficient and stable one-step generative models, which are crucial for real-time inference in various applications, thus impacting both academia and industry.

Technical Contribution

The paper's technical contributions include: 1) Introducing a KDE-gradient based conservative drifting velocity that restores the gradient field structure absent in traditional displacement velocities; 2) Deriving continuous-time finite-particle joint entropy identities that yield convergence bounds for empirical Stein drift, smoothed Fisher discrepancy, and center velocity; 3) Identifying and controlling the reciprocal-KDE self-interaction term via local occupancy conditions to ensure stability; 4) Explicitly tracking quadrature constants and their bandwidth dependencies, enabling optimized bandwidth selection and convergence rates; 5) Providing a parallel analysis of the non-conservative Laplace kernel drifting method via companion kernel decomposition, revealing inherent residual terms and their impact on convergence. These advances deepen the theoretical foundation of drifting models and open avenues for improved generative modeling algorithms.

Novelty

This paper is the first to systematically analyze finite-particle convergence rates for KDE-gradient conservative drifting methods via a joint entropy framework, overcoming the longstanding challenge of non-conservative displacement velocities in drifting models. Unlike prior work such as Deng et al. (2026), which introduced non-conservative displacement-based velocities and sharp normalizations, this work restores conservatism and provides rigorous finite-particle error bounds with explicit reciprocal-KDE self-interaction control. The detailed bandwidth-dependent convergence rates and companion analysis of non-conservative Laplace drifting represent fundamental innovations that significantly advance the theoretical understanding of one-step generative modeling.

Limitations

The conservative drifting method requires kernels with high-order smoothness, excluding non-smooth kernels like the Laplace kernel, which limits the method's general applicability.
Control of the reciprocal-KDE self-interaction term depends on local occupancy conditions and careful bandwidth selection, which may be challenging in practice and affect numerical stability.
The non-conservative Laplace drifting method inherently contains scale-mismatch residuals that cannot be eliminated, restricting achievable convergence rates.

Future Work

Future research directions include extending the conservative drifting framework to accommodate non-smooth kernels, such as by introducing smooth regularizations of the Laplace kernel; developing dynamic control mechanisms for the reciprocal-KDE self-interaction term to enhance stability and robustness, especially in high-dimensional settings; exploring multi-step drifting strategies combined with conservative drifting to improve generation accuracy and efficiency; and generalizing the theoretical analysis to complex, high-dimensional data distributions to bridge theory and practical applications.

AI Executive Summary

Generative modeling has become a cornerstone of modern machine learning, with one-step generative models gaining attention due to their efficiency in inference. However, existing displacement-based drifting velocities used in these models suffer from non-conservatism, lacking a gradient field structure, which undermines theoretical guarantees and training stability. Addressing this fundamental limitation, the present work proposes a conservative drifting method based on kernel density estimator (KDE) gradients, replacing the traditional displacement velocity with the difference between kernel-smoothed data and model scores. This construction restores the gradient field property, ensuring mathematically well-behaved particle dynamics.

The authors rigorously analyze the finite-particle system dynamics using a joint-entropy identity, deriving continuous-time convergence bounds that encompass empirical Stein drift, smoothed Fisher discrepancy, and squared center velocity. A critical correction term, the reciprocal-KDE self-interaction, is introduced and controlled via deterministic and probabilistic local occupancy conditions. By explicitly tracking quadrature constants and their dependence on kernel bandwidth, the analysis reveals that under an h-uniform quadrature regularity condition, the root residual velocity converges at rate N^{-1/(d+4)}, while a more general growth condition yields an optimized rate of N^{-(2-β)/(2(d+4-β))} for 0 ≤ β < 2.

In addition to the conservative method, the paper examines the non-conservative drifting approach with the Laplace kernel, decomposing its velocity into a positive scalar preconditioning of a sharp-score mismatch plus a scale-mismatch residual. This decomposition elucidates an unavoidable residual term that limits finite-particle convergence rates, highlighting the advantages of the conservative approach.

The theoretical results are connected to practical one-step generation by relating continuous-time residual velocity bounds to explicit drift step sizes, providing guarantees on the quality of generated samples. Although the work is primarily theoretical, it lays a solid foundation for designing stable and efficient one-step generative models with rigorous convergence guarantees.

Looking forward, the study opens avenues for extending conservative drifting to non-smooth kernels, refining reciprocal-KDE self-interaction control, and integrating multi-step drifting schemes. These developments promise to enhance the scalability and applicability of drifting models across diverse high-dimensional data domains, bridging the gap between theory and real-world generative modeling challenges.

Deep Analysis

Background

Generative modeling has witnessed significant advances in recent years, particularly with diffusion and score-based models demonstrating impressive capabilities in generating high-fidelity images, audio, and other modalities. These models typically rely on multi-step sampling procedures, which, while effective, incur high computational costs and latency during inference. To address these challenges, Deng et al. (2026) introduced drifting models, which shift the computational burden to training by directly moving the model distribution via a drift velocity field, enabling one-step generation at inference. The core idea is to design a drift velocity that attracts particles toward the data distribution while repelling them from the model distribution, thereby evolving the particle cloud toward the target.

However, the original displacement-based drifting velocities often lack a conservative (gradient) structure, complicating theoretical analysis and potentially causing instability in finite-particle systems. This non-conservatism arises from position-dependent normalization factors inherent in the velocity definition, except in special cases like Gaussian kernels. Recent efforts have sought to restore conservatism by introducing sharp normalizations or kernel gradient constructions, but a comprehensive finite-particle convergence analysis remained elusive. This paper builds upon and extends these foundational works by rigorously analyzing KDE-gradient conservative drifting velocities and their finite-particle behavior.

Core Problem

The central problem addressed is the non-conservatism of displacement-based drifting velocities in one-step generative models. Non-conservative velocity fields lack a potential function whose gradient they represent, leading to dynamics that are harder to analyze and potentially unstable. This issue is exacerbated by position-dependent normalization terms in the velocity, which break gradient structure except for Gaussian kernels. Consequently, finite-particle systems driven by such velocities may exhibit uncontrolled behavior, impeding theoretical guarantees on convergence and stability. The challenge is to design a drifting velocity that is both conservative and effective at guiding particles toward the data distribution, while enabling rigorous finite-particle convergence analysis and practical algorithmic implementation.

Innovation

The paper introduces several key innovations:

1) Conservative KDE-gradient drifting velocity: The authors replace the displacement-based velocity with the difference between kernel-smoothed data and model scores, forming a gradient field that restores conservatism and enables stable particle dynamics.

2) Joint-entropy framework: By leveraging and extending the joint entropy approach from Stein Variational Gradient Descent (SVGD) literature, the authors derive continuous-time finite-particle convergence bounds that tightly control empirical Stein drift, smoothed Fisher discrepancy, and center velocity.

3) Reciprocal-KDE self-interaction term: The analysis identifies a novel finite-particle correction term arising from the KDE denominator’s dependence on particle positions. The authors establish deterministic and high-probability local occupancy conditions to control this term, ensuring stability.

4) Explicit bandwidth-dependent quadrature constants: The work tracks how kernel bandwidth influences quadrature errors and convergence rates, enabling optimized bandwidth selection strategies.

5) Non-conservative Laplace kernel analysis: The paper provides a companion analysis for the original displacement-based Laplace kernel drifting method, decomposing its velocity into a sharp-score mismatch and an unavoidable scale-mismatch residual, clarifying intrinsic limitations.

These innovations collectively advance the theoretical foundations and practical understanding of drifting models for one-step generative modeling.

Methodology

�� Conservative velocity construction: Define the KDE q_x(z) = (1/N) ∑_{j=1}^N K_h(z - x_j) and smoothed densities ν_h, μ_h. The conservative velocity is b_{ν,μ,h}(z) = ∇ log ν_h(z) - ∇ log μ_h(z), ensuring a gradient field.

�� Finite-particle dynamics: Model particle evolution via ODEs ˙X_i(t) = b_x(X_i(t)), where b_x depends on the full particle configuration through KDE scores.

�� Joint entropy and Stein divergence: Introduce joint relative entropy H_N(t) = KL(P_N(t) || ρ^{⊗N}) to measure particle distribution divergence. Define empirical Stein drift S_N and smoothed Fisher discrepancy I_N to quantify convergence.

�� Entropy identity derivation: Prove d/dt H_N(t) = E_t[S_N(X_t)] + ΔK(0)/N E_t[R_N(X_t)], where R_N is the reciprocal KDE term, highlighting the self-interaction correction.

�� Control of reciprocal-KDE term: Establish local occupancy conditions ensuring R_N remains bounded, preventing denominator singularities.

�� Quadrature error analysis: Track constants A_{B,T}(h), V_{T}(h) related to kernel derivatives, showing their dependence on bandwidth h and their impact on convergence rates.

�� Bandwidth optimization: Balance entropy initialization, self-interaction, and quadrature errors to select optimal h* minimizing residual velocity, yielding convergence rates N^{-(2-β)/(2(d+4-β))}.

�� Non-conservative Laplace kernel analysis: Decompose velocity into sharp companion kernel components, identify scale-mismatch residual Δ_h, and derive finite-particle rates including unavoidable residual terms.

�� One-step generation guarantee: Translate continuous-time residual velocity bounds into explicit Wasserstein distance bounds for one-step generated distributions via drift size η.

Experiments

The paper is primarily theoretical and does not present empirical experiments on specific datasets. The analysis applies generally to finite-particle systems in ℝ^d with arbitrary dimension d, kernel functions K_h, and bandwidth h. The authors compare Gaussian and Laplace kernels theoretically, demonstrating how kernel smoothness affects conservatism and convergence. Assumptions on local occupancy and kernel smoothness guide practical algorithm design. Although no numerical simulations or benchmark datasets are reported, the theoretical results provide foundational insights for future empirical validation and algorithm development.

Results

The authors establish a fundamental finite-particle entropy identity linking the time derivative of joint relative entropy to empirical Stein drift and a reciprocal-KDE self-interaction term. Under suitable regularity and occupancy assumptions, they prove that the time-averaged empirical Stein drift is bounded by the initial entropy scaled by 1/(N T) plus a term proportional to 1/N, quantifying convergence speed. By relating empirical Stein drift to the smoothed Fisher discrepancy and squared center velocity via quadrature bounds, they derive explicit convergence rates. With bandwidth h chosen optimally, the root residual velocity converges at rate N^{-1/(d+4)} under uniform quadrature regularity, and more generally at N^{-(2-β)/(2(d+4-β))} where β reflects kernel score field regularity. For the non-conservative Laplace kernel drifting method, velocity decomposition reveals an unavoidable scale-mismatch residual Δ_h, which imposes a lower bound on convergence rates. These results rigorously quantify the trade-offs in kernel choice, bandwidth selection, and finite-particle effects in drifting models.

Applications

The conservative drifting method is applicable to designing efficient one-step generative models in domains requiring rapid sample generation with stability guarantees, such as image synthesis, speech generation, and reinforcement learning policy sampling. By restoring gradient field structure, the method enhances training stability and theoretical interpretability, facilitating integration into existing generative frameworks. The explicit analysis of bandwidth and particle number informs practical parameter tuning, improving robustness. Moreover, the theoretical insights into non-conservative methods guide kernel selection and highlight limitations, aiding practitioners in method choice. These contributions support deployment in real-time inference scenarios and high-dimensional generative tasks.

Limitations & Outlook

The method’s reliance on kernels with high smoothness excludes direct application to non-smooth kernels like the Laplace kernel, limiting generality. Controlling the reciprocal-KDE self-interaction term requires stringent local occupancy and bandwidth conditions, which may be difficult to satisfy in practice, potentially affecting numerical stability. The non-conservative Laplace drifting approach inherently contains scale-mismatch residuals that cannot be eliminated, restricting achievable convergence rates and generation quality.

Abstract

We propose and analyze a conservative drifting method for one-step generative modeling. The method replaces the original displacement-based drifting velocity by a kernel density estimator (KDE)-gradient velocity, namely the difference of the kernel-smoothed data score and the kernel-smoothed model score. This velocity is a gradient field, addressing the non-conservatism issue identified for general displacement-based drifting fields. We prove continuous-time finite-particle convergence bounds for the conservative method on $\R^d$: a joint-entropy identity yields bounds for the empirical Stein drift, the smoothed Fisher discrepancy of the KDE, and the squared center velocity. The main finite-particle correction is a reciprocal-KDE self-interaction term, and we give deterministic and high-probability local-occupancy conditions under which this term is controlled. We keep the quadrature constants explicit and track their possible bandwidth dependence: the root residual-velocity rate $N^{-1/(d+4)}$ holds under an additional $h$-uniform quadrature regularity condition, while a more general growth condition yields the optimized root rate $N^{-(2-β)/(2(d+4-β))}$, where $0\le β<2$. We also analyze the non-conservative drifting method with Laplace kernel, corresponding to the original displacement-based velocity proposed in~\cite{deng2026drifting}. For this method, a sharp companion kernel decomposes the velocity into a positive scalar preconditioning of a sharp-score mismatch plus a Laplace scale-mismatch residual, producing an analogous finite-particle rate with an unavoidable residual term. Finally, we explain how the continuous-time residual-velocity bounds translate into one-step generation guarantees through the explicit drift size $η$.

stat.ML cs.AI cs.LG math.ST

Finite-Particle Convergence Rates for Conservative and Non-Conservative Drifting Models

Key Findings

Methodology

Key Results

Significance

Technical Contribution

Novelty

Limitations

Future Work

AI Executive Summary

Deep Analysis

Background

Core Problem

Innovation

Methodology

Experiments

Results

Applications

Limitations & Outlook

Abstract

Related Papers

SSH-Net: A Deep Neural Network for Predicting Failure Time Distribution Functions under Competing Risks with Application to GPU Data

ProtoX-AD: Self-Explainable Time Series Anomaly Detection and Characterization

Conformal Bayes under Label Shift: Post-Hoc Calibration vs. In-Training Adaptation

Itô maps for any-step SDEs

Model-based Bootstrap of Controlled Markov Chains

A Divergence-Based Method for Weighting and Averaging Model Predictions