From Scores to Gibbs Correctors: Accelerating Uniform-Rate Discrete Diffusion Models

TL;DR

Proposed GADD algorithm achieves O(polylog(ε⁻¹)) sampling complexity for uniform-rate discrete diffusion models, significantly accelerating sampling.

cs.LG 🔴 Advanced 2026-05-27 88 views
Yuchen Liang Ness Shroff Yingbin Liang
Discrete Diffusion Models Gibbs Sampling Uniform-Rate Sampling Acceleration Predictor-Corrector Methods

Key Findings

Methodology

This paper introduces the Gibbs-Accelerated Discrete Diffusion (GADD) algorithm, a novel Gibbs-based corrector designed for uniform-rate discrete diffusion models. GADD leverages the structure of the concrete score function to directly construct Gibbs posterior likelihoods without requiring any additional training beyond standard score estimation. The algorithm integrates a predictor-corrector framework where the predictor is typically an Euler discretization of the reverse diffusion process, and the corrector is a random-scan Gibbs sampler iteratively refining samples. Theoretical analysis demonstrates that GADD achieves an overall sampling complexity of O(polylog(ε⁻¹)), breaking the polynomial dependence barrier present in prior methods. This is enabled by exploiting the diffusion process’s natural warm start and the fast mixing properties of Gibbs sampling, resulting in exponentially decaying error across iterations.

Key Results

  • On synthetic datasets, GADD achieves substantially lower total variation error at the same number of function evaluations (NFE) compared to Euler and θ-Trapezoidal methods, validating the theoretical O(polylog(ε⁻¹)) complexity advantage.
  • In zero-shot text sampling on the WikiText103 dataset, GADD consistently attains the lowest perplexity across NFEs of 32, 64, 128, and 256, while also demonstrating reduced wall-clock time relative to Euler and CTMC correctors, highlighting practical efficiency.
  • For zero-shot conditional music generation on the Lakh pianoroll dataset, GADD significantly improves generation quality metrics such as Hellinger distance and next-token prediction error, showcasing its applicability to complex symbolic sequence generation.

Significance

This work addresses the longstanding challenge of sampling efficiency in uniform-rate discrete diffusion models, achieving the first theoretical guarantee of logarithmic dependence on target accuracy ε. By substantially reducing the computational burden, GADD enables practical deployment of discrete diffusion models in symbolic domains including natural language processing, music generation, and molecular design. The novel theoretical framework for analyzing predictor-corrector methods also advances the understanding of error propagation in discrete diffusion sampling, bridging a critical gap between theory and practice.

Technical Contribution

Technically, GADD innovates by directly utilizing the score function to construct Gibbs posterior probabilities, circumventing the need for costly likelihood ratio estimations required by Metropolis-Hastings-based correctors. Theoretical contributions include a new induction-based error propagation analysis for predictor-corrector schemes that eschews the traditional Girsanov change-of-measure approach, yielding the first O(polylog(ε⁻¹)) complexity bound for uniform-rate discrete diffusion samplers. Additionally, the framework generalizes to classical CTMC correctors, providing a unified theoretical perspective.

Novelty

This paper is the first to propose a Gibbs corrector based on the score function for uniform-rate discrete diffusion models, achieving a breakthrough sampling complexity of O(polylog(ε⁻¹)). Unlike prior acceleration methods that rely on auxiliary training or Metropolis-Hastings corrections, GADD requires only standard score estimation, simplifying implementation and improving scalability. The novel theoretical analysis method further distinguishes this work from existing literature.

Limitations

  • Performance depends critically on the accuracy of score function estimation; significant estimation errors can degrade sampling quality and convergence speed.
  • In high-dimensional, complex distributions (e.g., low-temperature Ising models), Gibbs sampling may mix slowly, limiting the acceleration benefits.
  • Current experiments focus on uniform-rate models; applicability and performance in non-uniform or hybrid diffusion settings remain to be explored.

Future Work

Future research directions include enhancing the robustness and precision of score estimation, integrating adaptive step sizes and multi-scale correction strategies to further boost sampling efficiency. Extending GADD to non-uniform rate and continuous-discrete hybrid diffusion models is another promising avenue. Theoretically, deepening the understanding of error propagation in predictor-corrector frameworks could inspire the design of higher-order numerical solvers for discrete diffusion sampling.

AI Executive Summary

Discrete diffusion models have emerged as powerful generative frameworks for symbolic data domains such as text, graphs, music, and molecular structures. Despite their empirical success, uniform-rate discrete diffusion models suffer from slow sampling speeds due to the need for many iterative steps to generate a single sample. Existing acceleration methods either require additional training or suffer from slow mixing, limiting their practical utility.

To address this challenge, the authors propose the Gibbs-Accelerated Discrete Diffusion (GADD) algorithm, a novel Gibbs-based corrector that directly leverages the structure of the score function to construct Gibbs posterior likelihoods without extra training. GADD operates within a predictor-corrector framework, where the predictor is typically an Euler discretization of the reverse diffusion process, and the corrector applies random-scan Gibbs sampling to iteratively refine samples. This design exploits the natural warm start provided by the diffusion process and the fast mixing properties of Gibbs sampling, enabling exponential error decay across iterations.

The core technical innovation lies in the direct use of the score function for constructing posterior conditionals in the Gibbs sampler, avoiding the computationally expensive likelihood ratio estimations required by traditional Metropolis-Hastings correctors. The authors develop a novel induction-based theoretical analysis that tracks error propagation through predictor and corrector steps, bypassing the limitations of Girsanov change-of-measure techniques. This analysis yields the first O(polylog(ε⁻¹)) sampling complexity guarantee for uniform-rate discrete diffusion models, representing a significant theoretical breakthrough.

Extensive experiments validate GADD’s advantages. On synthetic data, GADD achieves lower total variation error than Euler and θ-Trapezoidal baselines at equivalent function evaluation budgets. In zero-shot text generation on WikiText103, GADD attains the lowest perplexity and faster wall-clock times compared to Euler and CTMC correctors. For zero-shot conditional music generation on the Lakh pianoroll dataset, GADD improves generation quality metrics substantially, demonstrating robustness across challenging symbolic generation tasks.

This work not only advances the state-of-the-art in efficient discrete diffusion sampling but also provides a unified theoretical framework for analyzing predictor-corrector methods, with implications beyond the proposed algorithm. By significantly reducing sampling complexity, GADD paves the way for practical deployment of discrete diffusion models in diverse symbolic data domains.

Looking forward, future work may focus on improving score estimation accuracy, extending the approach to non-uniform and hybrid diffusion processes, and exploring higher-order numerical methods within the predictor-corrector paradigm. These directions promise to further enhance the efficiency and applicability of discrete diffusion generative models.

Deep Analysis

Background

Generative modeling is a fundamental problem in deep learning, aiming to learn data distributions to generate high-quality samples. Diffusion models have recently gained prominence due to their flexibility and state-of-the-art performance across various domains. Continuous diffusion models excel in image generation, while discrete diffusion models, operating directly on discrete state spaces, have shown strong empirical results in symbolic domains such as natural language processing, graph generation, music composition, and molecular design. These models define a forward noising process and a corresponding reverse denoising process, often modeled as continuous-time Markov chains (CTMCs). Despite their success, discrete diffusion models, especially uniform-rate variants, suffer from slow sampling speeds because generating a single sample requires many iterative steps, hindering real-world applications.

Core Problem

Uniform-rate discrete diffusion models lack the structural properties exploited by masking-based diffusion models, making acceleration challenging. Existing acceleration techniques either rely on training additional auxiliary quantities or suffer from slow mixing and high computational costs. Theoretical guarantees for deterministic-step samplers show polynomial dependence on the inverse target accuracy ε, with the best known rates being O(ε⁻¹). Breaking this polynomial barrier to achieve logarithmic dependence remains an open problem. Efficient sampling with provable guarantees is critical for deploying discrete diffusion models in practical symbolic generation tasks.

Innovation

This paper introduces several key innovations:

1) A Gibbs-based corrector (GADD) that directly uses the estimated score function to construct Gibbs posterior likelihoods, eliminating the need for auxiliary likelihood ratio estimation.

2) Integration of GADD into a predictor-corrector framework that leverages the diffusion process’s warm start and Gibbs sampler’s fast mixing to achieve exponential error decay.

3) A novel theoretical analysis based on induction over predictor steps that tracks error propagation in the presence of imperfect corrector updates, circumventing the limitations of Girsanov change-of-measure techniques.

4) Establishment of the first O(polylog(ε⁻¹)) sampling complexity bound for uniform-rate discrete diffusion models, significantly improving over prior polynomial rates.

5) Extension of the theoretical framework to classical CTMC correctors, providing a unified view of predictor-corrector convergence.

Methodology

  • �� Discrete Diffusion Model Setup: Define a forward noising process as a uniform-rate CTMC over discrete state space S^d, with vocabulary size S and sequence length d.
  • �� Score Function Estimation: Train a neural network to estimate the score function, i.e., the density ratio between neighboring states differing by one token.
  • �� Gibbs Corrector Construction: Use the estimated score to compute conditional posterior probabilities for single-site Gibbs updates, enabling random-scan Gibbs sampling without additional training.
  • �� Predictor Step: Apply Euler discretization to approximate the continuous-time reverse diffusion process, producing initial samples at each time step.
  • �� Corrector Step: Run L_k iterations of random-scan Gibbs sampling targeting the perturbed distribution at each discretization point t_k, refining samples to reduce error.
  • �� Theoretical Analysis: Employ an induction argument over predictor steps to bound total variation distance errors, decomposing errors into initialization, mixing, and estimation components.
  • �� Algorithm Variants: Support both random-scan and systematic-scan Gibbs updates, with informed weighting schemes to improve convergence.
  • �� Computational Efficiency: Leverage batch score computations to minimize calls to the score network during Gibbs updates, ensuring scalability.

Experiments

  • �� Synthetic Data: Compare GADD against Euler method, θ-Trapezoidal accelerated sampler, and vanilla Gibbs sampler, measuring total variation error versus number of function evaluations (NFE).
  • �� Text Data: Train a small SEDD Uniform model on WikiText103; evaluate zero-shot text generation perplexity and wall-clock time for GADD, Euler, θ-Trapezoidal, and CTMC correctors.
  • �� Music Data: Use Lakh pianoroll dataset for zero-shot conditional music generation, assessing Hellinger distance, perplexity, and next-token prediction error.
  • �� Metrics: Total variation distance, perplexity, Hellinger distance, prediction error, and computational runtime.
  • �� Ablation Studies: Analyze impact of number of Gibbs correction steps and choice of Gibbs scan strategy on performance.

Results

  • �� On synthetic data, GADD achieves significantly lower total variation error than Euler and θ-Trapezoidal methods at equivalent NFEs, confirming theoretical efficiency gains.
  • �� In zero-shot text sampling on WikiText103, GADD consistently yields the lowest perplexity across NFEs of 32, 64, 128, and 256, while also reducing wall-clock time compared to baselines.
  • �� For zero-shot conditional music generation on the Lakh dataset, GADD improves Hellinger distance and next-token prediction error metrics, demonstrating robustness and quality improvements.
  • �� Theoretical and empirical results align, showing that GADD’s Gibbs corrector effectively mitigates error accumulation and leverages score estimation accuracy.

Applications

  • �� Text Generation: Enables efficient zero-shot text sampling, improving speed and quality in natural language processing applications.
  • �� Music Composition: Facilitates conditional music generation, supporting automated composition and style transfer in digital music production.
  • �� Molecular and Graph Generation: Accelerates generation of discrete structured data, aiding drug discovery and material design.
  • �� Broader Symbolic Domains: Applicable to code generation, symbolic reasoning, and other discrete data generation tasks requiring efficient sampling.

Limitations & Outlook

  • �� The method’s effectiveness depends on the accuracy of the score function estimator; poor estimates can degrade performance.
  • �� Gibbs sampling may exhibit slow mixing in high-dimensional or low-temperature distributions, limiting acceleration.
  • �� Current work focuses on uniform-rate diffusion; extension to non-uniform or hybrid models requires further investigation.

Abstract

Discrete diffusion models have achieved strong empirical performance in text and other symbolic domains, but, especially for uniform-rate models, they often require many steps to generate a single sample. Existing acceleration methods either rely on training additional quantities or suffer from slow mixing. In this work, we propose a novel Gibbs-based corrector for discrete diffusion models, termed Gibbs-Accelerated Discrete Diffusion (GADD). GADD leverages the structure of the concrete score function to construct Gibbs posterior likelihoods directly, without requiring any additional training beyond standard score estimation. We show that GADD achieves an overall sampling complexity of $\mathcal{O}(\mathrm{polylog} (\varepsilon^{-1}))$, yielding the first such rate for diffusion-based samplers for uniform-rate discrete diffusion models. We also conduct numerical experiments demonstrating the practical advantages of GADD across synthetic data, zero-shot text sampling, and zero-shot conditional music generation. These results corroborate the theory and show that GADD consistently improves sample quality and wall-clock efficiency over standard baselines, including vanilla Euler methods and CTMC correctors. Beyond this, our theoretical analysis introduces a novel framework for analyzing predictor-corrector methods in discrete diffusion models, which may be of independent interest. Unlike existing approaches that rely on the Girsanov change-of-measure technique, our method is based on an induction argument that tracks error propagation across predictor iterations while accounting for inaccuracies in the corrector updates.

cs.LG stat.ML

References (20)

How Discrete and Continuous Diffusion Meet: Comprehensive Analysis of Discrete Diffusion Models via a Stochastic Integral Framework

Yinuo Ren, Haoxuan Chen, Grant M. Rotskoff et al.

2024 41 citations ⭐ Influential View Analysis →

Informed Correctors for Discrete Diffusion Models

Yixiu Zhao, Jiaxin Shi, L. Mackey et al.

2024 48 citations ⭐ Influential View Analysis →

A Continuous Time Framework for Discrete Denoising Models

Andrew Campbell, Joe Benton, Valentin De Bortoli et al.

2022 400 citations ⭐ Influential View Analysis →

Fast Solvers for Discrete Diffusion Models: Theory and Applications of High-Order Algorithms

Yinuo Ren, Haoxuan Chen, Yuchen Zhu et al.

2025 39 citations ⭐ Influential View Analysis →

Discrete Diffusion Models: Novel Analysis and New Sampler Guarantees

Yuchen Liang, Yingbin Liang, Lifeng Lai et al.

2025 11 citations ⭐ Influential View Analysis →

Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution

Aaron Lou, Chenlin Meng, Stefano Ermon

2023 549 citations ⭐ Influential View Analysis →

Markov Chains and Mixing Times: Second Edition

David A. Levin, Y. Peres

2017 572 citations ⭐ Influential

Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis

Zikun Zhang, Zixiang Chen, Quanquan Gu

2024 19 citations ⭐ Influential View Analysis →

Faster Diffusion Models via Higher-Order Approximation

Gen Li, Yuchen Zhou, Yuting Wei et al.

2025 16 citations View Analysis →

Reversibility and Stochastic Networks

A. Unwin

1980 1437 citations

Masked Diffusion Models are Secretly Time-Agnostic Masked Models and Exploit Inaccurate Categorical Sampling

Kaiwen Zheng, Yongxin Chen, Hanzi Mao et al.

2024 165 citations View Analysis →

Convergence Analysis of Discrete Diffusion Model: Exact Implementation through Uniformization

Hongrui Chen, Lexing Ying

2024 37 citations View Analysis →

Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers

Yuchen Liang, Peizhong Ju, Yingbin Liang et al.

2024 5 citations View Analysis →

Discrete Flow Matching

Itai Gat, Tal Remez, Neta Shaul et al.

2024 284 citations View Analysis →

Language Models are Unsupervised Multitask Learners

Alec Radford, Jeff Wu, R. Child et al.

2019 28738 citations

Structured Denoising Diffusion Models in Discrete State-Spaces

Jacob Austin, Daniel D. Johnson, Jonathan Ho et al.

2021 1795 citations View Analysis →

Broadening Target Distributions for Accelerated Diffusion Models via a Novel Analysis Approach

Yuchen Liang, Peizhong Ju, Yingbin Liang et al.

2024 13 citations View Analysis →

Non-Asymptotic Convergence of Discrete Diffusion Models: Masked and Random Walk dynamics

Giovanni Conforti, Alain Durmus, Le-Tuyet-Nhi Pham et al.

2025 5 citations View Analysis →

Stochastic Runge-Kutta Methods: Provable Acceleration of Diffusion Models

Yuchen Wu, Yuxin Chen, Yuting Wei

2024 29 citations View Analysis →

Parallel Gibbs Sampling: From Colored Fields to Thin Junction Trees

Joseph Gonzalez, Yucheng Low, Arthur Gretton et al.

2011 176 citations