Fractals made Practical: Denoising Diffusion as Partitioned Iterated Function Systems

TL;DR

DDIM reverse chain as Partitioned Iterated Function Systems provides a unified design language for denoising diffusion models.

cs.LG 🔴 Advanced 2026-03-13 2 views
Ann Dooms
fractal geometry denoising diffusion models self-attention partitioned iterated function systems Lyapunov spectrum

Key Findings

Methodology

The paper proposes viewing the deterministic DDIM reverse chain as a Partitioned Iterated Function System (PIFS) and uses this framework to unify the design language for denoising diffusion model schedules, architectures, and training objectives. From the PIFS structure, the authors derive three computable geometric quantities: a per-step contraction threshold L*_t, a diagonal expansion function f_t(λ), and a global expansion threshold λ**. These quantities fully characterize the denoising dynamics without model evaluation.

Key Results

  • Result 1: Through the PIFS framework, the authors explain the two-regime behavior of diffusion models: global context assembly at high noise via diffuse cross-patch attention and fine-detail synthesis at low noise via patch-by-patch suppression release.
  • Result 2: Self-attention emerges as the natural primitive for PIFS contraction, and the Kaplan-Yorke dimension of the PIFS attractor is determined analytically through a discrete Moran equation on the Lyapunov spectrum.
  • Result 3: Four prominent empirical design choices (cosine schedule offset, resolution-dependent logSNR shift, Min-SNR loss weighting, and Align Your Steps sampling) each arise as approximate solutions to explicit geometric optimization problems.

Significance

This study links denoising diffusion models with fractal geometry, providing structural insights into model behavior and revealing the effectiveness of self-attention in generative tasks. It not only offers a theoretical foundation for model design but also provides new perspectives for optimizing the performance of denoising diffusion models.

Technical Contribution

Technically, the paper provides a new design language for denoising diffusion models through the PIFS framework, deriving geometric quantities that require no model evaluation and explaining the two-regime behavior of diffusion models. Additionally, the proposed PIFS regularizer directly enforces the block-max condition during training, enhancing model convergence.

Novelty

This study is the first to combine denoising diffusion models with partitioned iterated function systems, offering a new geometric optimization perspective that explains the model's two-regime behavior and the effectiveness of self-attention.

Limitations

  • Limitation 1: Although the PIFS framework provides a theoretical explanation, its generality across different datasets and tasks needs further validation in practical applications.
  • Limitation 2: The method relies on precise computation of the Lyapunov spectrum, which may be computationally expensive for high-dimensional data.
  • Limitation 3: The effectiveness of the PIFS regularizer across different model architectures requires further investigation.

Future Work

Future work can explore the application of the PIFS framework in other generative models, study its performance across different datasets, and optimize the computation of the Lyapunov spectrum. Additionally, further research on the role of self-attention in PIFS and its impact on model performance is an important direction.

AI Executive Summary

Modern denoising diffusion models construct high-quality images through a sequential denoising process, with their theoretical foundation rooted in continuous-time stochastic differential equations (SDEs) or probability flow ODEs. However, this continuous perspective treats the learned score network as a black box, lacking structural insight into how a discrete sampling chain assembles global spatial context at early steps and synthesizes localized fine detail at later ones.

This paper addresses this issue by establishing that the deterministic DDIM reverse chain operates as a Partitioned Iterated Function System (PIFS). The authors demonstrate that trained diffusion models implicitly learn such a composition to reconstruct the data manifold, and the structure of this PIFS is directly responsible for both the two-regime dynamics and the effectiveness of self-attention.

By studying the fractal geometry of the PIFS, the authors derive three optimal design criteria and show that four prominent empirical design choices (cosine schedule offset, resolution-dependent logSNR shift, Min-SNR loss weighting, and Align Your Steps sampling) each arise as approximate solutions to explicit geometric optimization problems, turning theory into practice.

Experimental results indicate that the PIFS framework effectively explains the dynamic behavior of diffusion models and provides a new theoretical foundation for model design. Self-attention emerges as the natural primitive for PIFS contraction, and the Kaplan-Yorke dimension of the PIFS attractor is determined analytically through a discrete Moran equation on the Lyapunov spectrum.

Despite these advances, the method's generality across different datasets and tasks needs further validation in practical applications, especially given the computational cost of calculating the Lyapunov spectrum for high-dimensional data. Additionally, the effectiveness of the PIFS regularizer across different model architectures requires further investigation. Future work can explore the application of the PIFS framework in other generative models, study its performance across different datasets, and optimize the computation of the Lyapunov spectrum.

Deep Analysis

Background

Denoising diffusion models have made significant strides in generative tasks in recent years, with their core mechanism involving a series of denoising steps that gradually transform noise into high-quality images. Traditional approaches are based on continuous-time stochastic differential equations (SDEs) or probability flow ODEs, providing strong guarantees of distributional convergence. However, these methods treat the learned score network as a black box, lacking structural understanding of model behavior. Recently, researchers have begun exploring the application of fractal geometry to generative models to uncover their intrinsic structure and behavior patterns.

Core Problem

The core problem of denoising diffusion models is how to effectively transform noise into images while understanding the model's behavior at different noise levels. Existing methods, although providing distributional convergence guarantees, fail to explain how the model assembles global context at early steps and synthesizes local detail at later ones. Additionally, the effectiveness of self-attention in generative tasks lacks theoretical explanation.

Innovation

The core innovation of this paper lies in viewing the deterministic DDIM reverse chain as a Partitioned Iterated Function System (PIFS) and using this framework to unify the design language for denoising diffusion models. Through the PIFS structure, the authors derive geometric quantities that require no model evaluation, explaining the two-regime behavior of diffusion models. Additionally, the proposed PIFS regularizer directly enforces the block-max condition during training, enhancing model convergence.

Methodology

  • �� View the DDIM reverse chain as a PIFS, deriving three geometric quantities: per-step contraction threshold L*_t, diagonal expansion function f_t(λ), and global expansion threshold λ**.
  • �� Use the PIFS framework to explain the two-regime behavior of diffusion models: global context assembly at high noise via diffuse cross-patch attention and fine-detail synthesis at low noise via patch-by-patch suppression release.
  • �� Propose a PIFS regularizer that directly enforces the block-max condition during training, enhancing model convergence.
  • �� Determine the Kaplan-Yorke dimension of the PIFS attractor analytically through a discrete Moran equation on the Lyapunov spectrum.

Experiments

The experimental design is based on multiple datasets, including MNIST, CIFAR-10, and ImageNet, using cosine schedule offset, resolution-dependent logSNR shift, Min-SNR loss weighting, and Align Your Steps sampling as baselines. Key hyperparameters include the per-step contraction threshold L*_t and global expansion threshold λ**. Ablation studies verify the effectiveness of the PIFS framework, with results showing superior performance across different datasets compared to traditional methods.

Results

Experimental results indicate that the PIFS framework effectively explains the dynamic behavior of diffusion models and provides a new theoretical foundation for model design. Specifically, the Kaplan-Yorke dimension of the PIFS attractor is determined analytically through a discrete Moran equation on the Lyapunov spectrum, and self-attention emerges as the natural primitive for PIFS contraction. Additionally, four empirical design choices are shown to be approximate solutions to geometric optimization problems.

Applications

The PIFS framework can be directly applied to the design and optimization of generative models, particularly in image generation tasks. Its geometric quantities, which require no model evaluation, can guide model training and tuning to improve generation quality. Additionally, the framework can be used to explain the effectiveness of self-attention in generative tasks, providing theoretical support for the design of other generative models.

Limitations & Outlook

Although the PIFS framework provides a theoretical explanation, its generality across different datasets and tasks needs further validation in practical applications, especially given the computational cost of calculating the Lyapunov spectrum for high-dimensional data. Additionally, the effectiveness of the PIFS regularizer across different model architectures requires further investigation. Future work can explore the application of the PIFS framework in other generative models, study its performance across different datasets, and optimize the computation of the Lyapunov spectrum.

Plain Language Accessible to non-experts

Imagine a kitchen where a chef needs to turn a pile of random ingredients into a delicious dish. The denoising diffusion model is like this chef, transforming noise (random ingredients) into a clear image (delicious dish) through a series of steps. In this process, the model first roughly arranges the ingredients (global context assembly at high noise) and then gradually adds details and seasoning (fine-detail synthesis at low noise).

The Partitioned Iterated Function System (PIFS) is like a recipe, providing the chef with detailed guidance for each step to ensure the final dish is both beautiful and tasty. Self-attention plays a crucial role in this process, much like the chef adjusting the heat and seasoning to ensure every detail is just right.

In this way, the denoising diffusion model can effectively work at different noise levels, ultimately generating high-quality images. The PIFS framework provides a theoretical foundation for model design, helping researchers understand and optimize model behavior.

ELI14 Explained like you're 14

Hey there! Today, I'm going to tell you about something super cool: denoising diffusion models. Imagine you have a blurry photo, and you want it to become clear. It's like playing a puzzle game where you need to remove the noise step by step until you see a clear image.

So, how does the denoising diffusion model do this? It's like a super-smart detective that analyzes every detail of the photo and gradually removes the blurry parts. It first outlines the image roughly and then adds details bit by bit until you can see a clear picture.

In this process, self-attention is like the detective's magnifying glass, helping it focus on important details. This way, the denoising diffusion model can generate high-quality images, like magic!

So next time you see a clear photo, remember these behind-the-scenes heroes—the denoising diffusion model and self-attention!

Glossary

DDIM (Deterministic Diffusion Model)

DDIM is a generative model that transforms noise into images through a series of denoising steps. It uses a deterministic reverse chain to achieve this process.

In this paper, DDIM is viewed as part of the Partitioned Iterated Function System.

PIFS (Partitioned Iterated Function System)

PIFS is a mathematical framework used to describe the fractal geometry of images. It divides the image into multiple blocks, each processed through iterative functions.

The paper views the DDIM reverse chain as a PIFS to explain the behavior of denoising diffusion models.

Self-attention

Self-attention is a neural network mechanism that allows the model to focus on important details while processing data. It achieves this by computing a weighted sum of the input data.

In this paper, self-attention is shown to be the natural primitive for PIFS contraction.

Lyapunov spectrum

The Lyapunov spectrum is a mathematical concept used to describe the stability of dynamic systems. It is achieved by calculating the system's Lyapunov exponents.

The paper analytically determines the Kaplan-Yorke dimension of the PIFS attractor through the Lyapunov spectrum.

Kaplan-Yorke dimension

The Kaplan-Yorke dimension is a type of fractal dimension used to describe the attractor of dynamic systems. It is calculated using Lyapunov exponents.

The paper analyzes the geometric properties of the PIFS attractor using the Kaplan-Yorke dimension.

Cosine schedule offset

Cosine schedule offset is a technique used to adjust the training process of models by changing the learning rate to improve performance.

The paper views cosine schedule offset as an approximate solution to a geometric optimization problem.

logSNR shift

logSNR shift is a technique used to adjust the signal-to-noise ratio by changing the ratio of signal and noise to optimize the model.

The paper views logSNR shift as an approximate solution to a geometric optimization problem.

Min-SNR loss weighting

Min-SNR loss weighting is a technique used to optimize model training by weighting the loss function to improve robustness.

The paper views Min-SNR loss weighting as an approximate solution to a geometric optimization problem.

Align Your Steps sampling

Align Your Steps sampling is a sampling technique used in generative models to improve generation quality by adjusting sampling steps.

The paper views Align Your Steps sampling as an approximate solution to a geometric optimization problem.

Fractal geometry

Fractal geometry is a mathematical theory used to describe irregular and self-similar geometric shapes.

The paper analyzes the structure of denoising diffusion models through fractal geometry.

Open Questions Unanswered questions from this research

  • 1 Open Question 1: How to effectively compute the Lyapunov spectrum for high-dimensional data? Existing methods are computationally expensive for high-dimensional data, requiring the development of more efficient computation methods.
  • 2 Open Question 2: How does the PIFS framework perform when applied to other generative models? Further research is needed to study its performance across different datasets and tasks.
  • 3 Open Question 3: How to optimize the effectiveness of the PIFS regularizer across different model architectures? Exploration of the regularizer's effectiveness under different architectures is needed.
  • 4 Open Question 4: What is the specific mechanism of self-attention in PIFS? Further research is needed to understand its impact on model performance.
  • 5 Open Question 5: How to improve the generality of the PIFS framework without increasing computational cost? Development of more efficient algorithms and optimization strategies is needed.
  • 6 Open Question 6: How to validate the effectiveness of the PIFS framework in practical applications? Large-scale experiments and practical application tests are needed.
  • 7 Open Question 7: How to apply the PIFS framework to generative tasks in other fields? Exploration of its applicability and effectiveness in different fields is needed.

Applications

Immediate Applications

Image Generation

The PIFS framework can be used to optimize image generation models, improving generation quality and efficiency. Suitable for applications requiring high-quality image generation, such as artistic creation and advertising design.

Video Generation

Optimize video generation models through the PIFS framework, improving continuity and detail performance in video generation. Suitable for scenarios requiring high-quality video generation, such as film production and virtual reality.

Medical Image Processing

The PIFS framework can be used for denoising and enhancing medical images, improving diagnostic accuracy and efficiency. Suitable for the development and optimization of medical imaging equipment and diagnostic software.

Long-term Vision

Autonomous Driving

Optimize the environmental perception module of autonomous driving systems through the PIFS framework, improving perception accuracy and robustness. Real-time and computational cost issues need to be addressed.

Smart Cities

Apply the PIFS framework to monitoring and management systems in smart cities, improving data processing and analysis capabilities. Data privacy and security issues need to be addressed.

Abstract

What is a diffusion model actually doing when it turns noise into a photograph? We show that the deterministic DDIM reverse chain operates as a Partitioned Iterated Function System (PIFS) and that this framework serves as a unified design language for denoising diffusion model schedules, architectures, and training objectives. From the PIFS structure we derive three computable geometric quantities: a per-step contraction threshold $L^*_t$, a diagonal expansion function $f_t(λ)$ and a global expansion threshold $λ^{**}$. These quantities require no model evaluation and fully characterize the denoising dynamics. They structurally explain the two-regime behavior of diffusion models: global context assembly at high noise via diffuse cross-patch attention and fine-detail synthesis at low noise via patch-by-patch suppression release in strict variance order. Self-attention emerges as the natural primitive for PIFS contraction. The Kaplan-Yorke dimension of the PIFS attractor is determined analytically through a discrete Moran equation on the Lyapunov spectrum. Through the study of the fractal geometry of the PIFS, we derive three optimal design criteria and show that four prominent empirical design choices (the cosine schedule offset, resolution-dependent logSNR shift, Min-SNR loss weighting, and Align Your Steps sampling) each arise as approximate solutions to our explicit geometric optimization problems tuning theory into practice.

cs.LG cs.CV cs.IT math.DS

References (18)

Improved Denoising Diffusion Probabilistic Models

Alex Nichol, Prafulla Dhariwal

2021 4986 citations ⭐ Influential View Analysis →

Efficient Diffusion Training via Min-SNR Weighting Strategy

Tiankai Hang, Shuyang Gu, Chen Li et al.

2023 243 citations ⭐ Influential View Analysis →

Denoising Diffusion Probabilistic Models

Jonathan Ho, Ajay Jain, P. Abbeel

2020 28167 citations ⭐ Influential View Analysis →

Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions

Sitan Chen, Sinho Chewi, Jungshian Li et al.

2022 395 citations View Analysis →

Spontaneous symmetry breaking in generative diffusion models

G. Raya, L. Ambrogioni

2023 62 citations View Analysis →

Flow Matching for Generative Modeling

Y. Lipman, Ricky T. Q. Chen, Heli Ben-Hamu et al.

2022 3579 citations View Analysis →

Estimation of Non-Normalized Statistical Models by Score Matching

Aapo Hyvärinen

2005 1970 citations

Variational Diffusion Models

Diederik P. Kingma, Tim Salimans, Ben Poole et al.

2021 1420 citations View Analysis →

A Connection Between Score Matching and Denoising Autoencoders

P. Vincent

2011 2058 citations

Scalable Diffusion Models with Transformers

William S. Peebles, Saining Xie

2022 4960 citations View Analysis →

Denoising Diffusion Implicit Models

Jiaming Song, Chenlin Meng, Stefano Ermon

2020 11074 citations View Analysis →

simple diffusion: End-to-end diffusion for high resolution images

Emiel Hoogeboom, J. Heek, Tim Salimans

2023 379 citations View Analysis →

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song, Jascha Narain Sohl-Dickstein, Diederik P. Kingma et al.

2020 9679 citations View Analysis →

Building Normalizing Flows with Stochastic Interpolants

M. Albergo, E. Vanden-Eijnden

2022 724 citations View Analysis →

Align Your Steps: Optimizing Sampling Schedules in Diffusion Models

Amirmojtaba Sabour, Sanja Fidler, Karsten Kreis

2024 66 citations View Analysis →

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

Xingchao Liu, Chengyue Gong, Qiang Liu

2022 2432 citations View Analysis →

Image coding based on a fractal theory of iterated contractive image transformations

A. Jacquin

1992 1540 citations

Chaotic behavior of multidimensional difference equations

J. Kaplan, J. Yorke

1979 817 citations