Itô maps for any-step SDEs
Introduces Itô maps for arbitrary-step SDE sampling, enabling conditional sampling and control, enhancing diversity and efficiency.
Key Findings
Methodology
This paper proposes a framework based on Itô maps for predicting paths of arbitrary-step SDEs. It leverages low-dimensional representations of Brownian paths—via Karhunen–Loève expansion and Haar wavelets—to enable single-pass, high-dimensional data prediction. The model learns the stochastic flow map Φs,t, which takes an intermediate state xt and a Brownian path W to produce the future state xu, preserving stochasticity and uncertainty. Training employs self-distillation strategies, including Lagrangian Self-Distillation (LSD) and Progressive Self-Distillation (LPSD), to enforce path consistency across different time points. The approach incorporates Bayesian path estimators (BEL series) and gradient estimators (Itô-G, Itô-GF) for inference-time control, supporting reward-guided sampling. Extensive experiments demonstrate superior performance in synthetic, image, and posterior sampling tasks, validating the method’s effectiveness.
Key Results
- In a 1D Gaussian mixture setting, the Itô map accurately reproduces SDE trajectories with a mean RMSE of approximately 6.92×10^-2, outperforming traditional methods. In 2D posterior sampling, the Itô-G method achieves a SW2 distance of 0.16 and an MMD of 0.024, surpassing baselines like MFM-G (SW2=0.51, MMD=0.38) and DPS (SW2=1.95). On MNIST, the pixel-space MSE drops to 0.05, confirming precise path prediction. For control tasks, the model successfully modulates class proportions with KL divergence as low as 0.024, demonstrating strong reward-guided capabilities. These results highlight the method’s robustness and versatility across tasks.
- Low-dimensional Brownian features (Karhunen–Loève and Haar wavelets) effectively compress path information, reducing computational complexity while maintaining stochastic diversity. The model demonstrates high-quality, diverse conditional samples and precise control in high-dimensional image generation and posterior inference. The experimental results validate the theoretical advantages of path-wise stochastic flow maps, showing significant improvements over existing deterministic or partial stochastic models.
Significance
This work advances the field by bridging the gap between deterministic flow models and stochastic dynamics, providing a mathematically principled way to model the residual uncertainty in path predictions. It introduces a novel class of stochastic flow maps that support exact conditional laws, enabling efficient posterior sampling and reward-based control in high-dimensional data. The approach addresses fundamental limitations of existing methods, such as the inability to represent the full posterior distribution from intermediate states, thus opening new avenues for scalable, flexible, and diverse generative modeling. Its potential impact spans image synthesis, Bayesian inference, and reinforcement learning, offering a unified framework for stochastic path prediction and control that is both theoretically sound and practically feasible.
Technical Contribution
The paper’s key technical contributions include: 1) formalizing the Itô map as a pathwise stochastic flow map conditioned on Brownian paths, enabling arbitrary-step predictions; 2) developing low-dimensional Brownian path representations via Karhunen–Loève expansion and Haar wavelet decomposition, facilitating scalable training; 3) proposing a self-distillation training scheme that enforces path consistency across multiple time points, ensuring pathwise accuracy; 4) deriving Bayesian path estimators (BEL series) and gradient estimators (Itô-G, Itô-GF) for reward-guided control, grounded in stochastic calculus and pathwise Jacobians; 5) validating the approach through extensive experiments on synthetic, image, and posterior sampling benchmarks, demonstrating superior accuracy, diversity, and control capabilities compared to existing methods.
Novelty
This work is the first to systematically introduce Itô maps for arbitrary-step stochastic path prediction in high-dimensional generative models. Unlike prior deterministic flow models or endpoint-only stochastic samplers, it explicitly models the full conditional law between intermediate states and the endpoint, leveraging pathwise Brownian information. The integration of low-dimensional path features with self-distillation training and Bayesian control estimators represents a significant innovation, enabling scalable, flexible, and diverse path sampling with theoretical guarantees. This approach fundamentally shifts the paradigm from deterministic transport to stochastic pathwise modeling, opening new possibilities for probabilistic inference and control in deep generative modeling.
Limitations
- Despite the efficiency gains, training the model on very high-dimensional data (e.g., high-res images) remains computationally intensive, especially in extracting and optimizing low-dimensional path features.
- The current low-dimensional Brownian representations (e.g., 5 KL modes) may limit the expressiveness for extremely complex or non-Gaussian paths, requiring adaptive feature selection strategies.
- Model robustness in highly noisy or non-Gaussian environments has not been fully explored, and performance may degrade under such conditions. Further research is needed to extend the framework to broader noise models and dynamics.
Future Work
Future research will focus on adaptive, multi-scale path feature learning to enhance expressiveness and efficiency. Integrating reinforcement learning techniques could optimize path control policies for complex tasks like robotics and autonomous driving. Extending the theoretical analysis to guarantee convergence and stability in more general stochastic environments is also a priority. Additionally, exploring applications in multimodal data, such as video and 3D point clouds, can broaden the impact of this framework. Combining the Itô map approach with large-scale pretraining and transfer learning may further improve scalability and real-world applicability.
AI Executive Summary
The evolution of deep generative models has revolutionized the way we synthesize complex data like images and audio. Techniques such as diffusion models and score-based methods have demonstrated remarkable success, yet they often rely on multi-step iterative processes that are computationally expensive and slow. These methods typically approximate the data distribution through deterministic flow maps or partial stochastic dynamics, which limits their ability to model the full uncertainty inherent in the generative process. As a result, the community has sought more efficient, flexible, and probabilistically faithful sampling methods.
This paper introduces a groundbreaking approach based on Itô maps, a class of stochastic flow maps capable of predicting the evolution of data along arbitrary time steps conditioned on Brownian paths. Unlike traditional deterministic flow models, Itô maps explicitly incorporate the stochasticity of the underlying SDEs, enabling the modeling of the full conditional distribution from intermediate states to the final data point. This innovation addresses a critical gap: the inability of existing models to represent the residual uncertainty and diversity in the generated samples, especially when conditioning on noisy or partial information.
The core idea hinges on representing the Brownian motion driving the SDE with low-dimensional features, such as Karhunen–Loève modes and Haar wavelet coefficients. These features capture the essential randomness with minimal complexity, making the learning process computationally feasible even in high-dimensional data spaces. The authors develop a training framework based on self-distillation, which enforces pathwise consistency across different time points, ensuring that the learned stochastic flow respects the underlying SDE dynamics. This approach involves optimizing both the local drift component (G_{t,t}) and the global pathwise component (G_{s,t}), using novel loss functions like LSD and LPSD.
Experimental validation spans synthetic and real-world tasks. In one-dimensional Gaussian mixtures, the model accurately reproduces trajectories with errors as low as 6.92×10^-2. In more complex two-dimensional posterior sampling, the method achieves SW2 distances of 0.16, outperforming baselines like MFM-G and DPS. In high-dimensional image generation tasks such as MNIST and ImageNet, the model demonstrates excellent path prediction accuracy and effective control over sample attributes, such as class proportions, with KL divergences below 0.025. These results confirm that the Itô map framework not only accelerates sampling but also enhances the diversity and controllability of generated data.
Beyond technical achievements, this work opens new avenues for probabilistic modeling, stochastic control, and reinforcement learning. By providing a mathematically rigorous and computationally practical tool for pathwise stochastic prediction, it paves the way for more flexible, robust, and efficient generative systems. The ability to perform exact conditional sampling from intermediate states, combined with low-cost control estimators, makes this approach highly promising for applications requiring real-time decision-making, personalized content creation, and autonomous systems. Looking forward, the authors plan to explore multi-scale path features, adaptive control strategies, and broader applications in multimodal data, aiming to further push the boundaries of stochastic deep generative modeling.
Deep Analysis
Background
The field of deep generative modeling has experienced rapid growth, driven by models such as Variational Autoencoders, Generative Adversarial Networks, and more recently, diffusion and score-based models. Diffusion models, exemplified by DDPM (Denoising Diffusion Probabilistic Models) and score-based models like Song et al.'s work, have achieved state-of-the-art results in image synthesis, leveraging iterative denoising processes that approximate complex data distributions. These models typically rely on discretized SDEs or ODEs to define the evolution from noise to data, with the sampling process involving multiple sequential steps. While effective, this approach is computationally intensive, limiting real-time applications.
Recent advances have focused on distilling multi-step samplers into fewer steps or even single-step mappings, such as flow-matching and flow-map distillation techniques. These methods aim to compress the continuous-time dynamics into direct maps, often deterministic, which can generate samples rapidly but at the expense of losing stochasticity and the ability to model residual uncertainty. Score-based models inherently involve stochasticity through their SDE formulation, but most existing one-step models approximate the stochastic transition as a deterministic map, thus failing to capture the full posterior distribution conditioned on intermediate states.
The challenge remains in modeling the residual uncertainty and diversity in the generative process, especially for applications like posterior sampling, stochastic control, and conditional generation. Existing methods struggle to represent the full conditional law p_{1|t}(· | xt), which encodes the distribution of the endpoint conditioned on an intermediate noisy state. This gap motivates the development of a stochastic, pathwise approach that can accurately model the evolution of data along the entire stochastic trajectory, supporting both diversity and control.
Core Problem
The core problem addressed in this work is the inability of existing one-step generative models to accurately represent the full conditional distribution of the data endpoint given an intermediate noisy state. Deterministic flow maps, while fast, do not encode the residual uncertainty, limiting their use in posterior sampling and stochastic control. Conversely, stochastic models based on SDEs retain uncertainty but are computationally expensive and difficult to implement efficiently at arbitrary steps. The key challenge is to develop a method that can perform accurate, efficient, and flexible path prediction conditioned on partial information, while preserving the stochastic nature of the underlying dynamics. This is particularly important for applications requiring diverse samples, precise control, and real-time inference, such as image editing, data augmentation, and autonomous decision-making. The problem is compounded by the high dimensionality of data like images and videos, which makes path modeling computationally challenging. Therefore, a scalable, low-dimensional representation of the stochastic paths, combined with a theoretically sound predictive model, is essential to address these issues.
Innovation
The paper introduces the Itô map, a novel stochastic flow map that predicts the future state of an SDE conditioned on both the current state and a low-dimensional Brownian path representation. Unlike traditional deterministic flow models, the Itô map explicitly models the residual stochasticity by conditioning on the Brownian path, enabling exact sampling of the conditional law p_{1|t}(· | xt). The key innovations include:
- �� Low-dimensional Brownian path features: Using Karhunen–Loève expansion and Haar wavelets to compress the infinite-dimensional Brownian motion into manageable feature vectors (e.g., 5 modes), facilitating scalable learning.
- �� Pathwise learning via self-distillation: Training the model to enforce path consistency across multiple time points, ensuring the learned stochastic flow respects the SDE dynamics.
- �� Control and inference: Deriving Bayesian estimators (BEL series) and gradient-based control estimators (Itô-G, Itô-GF) that leverage the path structure for reward-guided sampling.
- �� Theoretical guarantees: Ensuring the learned maps are pathwise consistent and respect the stochastic calculus foundations, enabling accurate, diverse, and controllable sampling.
This approach fundamentally shifts the paradigm from deterministic transport to stochastic pathwise modeling, supporting both high-quality sample generation and flexible control.
Methodology
- �� Path feature extraction: Use Karhunen–Loève expansion to represent Brownian paths with a small set of Gaussian coefficients (e.g., 5 modes). Additionally, employ Haar wavelet decomposition to capture multi-scale path fluctuations, forming structured, finite-dimensional features.
- �� Itô map definition: Construct Φs,t as a function taking current state xt and Brownian feature vector W, predicting the future state xu along the same path.
- �� Training objectives: Minimize path consistency loss (LSD) by matching the predicted path derivatives with the true SDE drift and diffusion terms, and enforce pathwise evolution via LPSD, which ensures the learned map respects the semigroup property.
- �� Pathwise training: Sample pairs of times (s, t), generate intermediate states, simulate Brownian paths, and optimize the model to produce path-consistent predictions across multiple time points.
- �� Control estimation: Derive Bayesian estimators (BEL series) based on pathwise Jacobians and Brownian increments, enabling reward-guided sampling without explicit reward gradient dependence (Itô-GF). When gradients are available, use Itô-G for more accurate control.
- �� Implementation details: Use neural networks to parameterize G_{s,t} and G_{t,t}, train with stochastic gradient descent, and incorporate low-dimensional Brownian features for scalability. Regularize to avoid singularities at t=1 by setting σt = √2(1−t).
Experiments
The authors conduct comprehensive experiments across synthetic and real datasets. In 1D Gaussian mixtures, they compare true trajectories with predictions, achieving errors around 6.92×10^-2. In 2D posterior sampling, the model outperforms baselines like MFM-G and DPS, with SW2 distances of 0.16 and MMD of 0.024, demonstrating high-quality conditional samples. On MNIST, the model accurately reproduces image paths with pixel MSE of 0.05, validating path prediction in high dimensions. For control tasks, the model successfully modulates class proportions, with KL divergence below 0.025, indicating effective reward-guided sampling. The experiments also include ablation studies on the number of Brownian modes, path feature types, and training strategies, confirming the robustness and scalability of the approach.
Results
Quantitative results show that the proposed Itô map achieves a path prediction RMSE of 6.92×10^-2 in synthetic tasks, significantly outperforming deterministic flow models. In posterior sampling, SW2 and MMD metrics demonstrate clear advantages over baseline methods, with SW2 reduced from 1.95 (unsteered) to 0.16 (Itô-G). In image generation, pixel MSE drops to 0.05, confirming accurate path modeling. The control experiments show the model can steer class proportions with KL divergence as low as 0.024, outperforming existing control estimators. These results collectively validate the framework’s effectiveness in diverse tasks, highlighting its potential for scalable, flexible, and high-quality stochastic sampling and control.
Applications
This framework can be directly applied to high-dimensional image synthesis, posterior inference, and reward-guided generation. Its ability to perform exact conditional sampling from intermediate states makes it suitable for tasks like image editing, data augmentation, and personalized content creation. The low-dimensional Brownian feature extraction allows for scalable training and inference, even in large datasets like ImageNet. In the long term, integrating this approach with reinforcement learning and autonomous decision systems could enable real-time path planning and control in robotics, autonomous vehicles, and complex simulation environments. Its flexibility also supports multi-modal data modeling, such as video and 3D point clouds, broadening its impact across AI domains.
Limitations & Outlook
Despite promising results, the current model’s scalability to extremely high-resolution data remains computationally demanding, especially in extracting and optimizing low-dimensional path features. The fixed number of Brownian modes (e.g., 5) may limit expressiveness for highly complex or non-Gaussian paths, requiring adaptive feature selection. Additionally, robustness under severe noise or non-Gaussian dynamics has not been fully explored, and performance may degrade in such scenarios. Further research is needed to improve the efficiency of training in large-scale settings, extend the theoretical guarantees to more general stochastic processes, and develop adaptive path feature mechanisms to handle diverse data distributions.
Plain Language Accessible to non-experts
想象你在玩一款超级复杂的迷宫游戏,每次你走到一半,可能会遇到不同的风或障碍,导致你走的路线不一样。以前的游戏只能告诉你怎么从起点走到终点,但不能告诉你中途的每个选择会变成什么样。现在,这个新方法就像是给你一张神奇的地图,不仅告诉你从中间某个点出发,怎么走到终点,还能考虑到风的变化,让你可以在不同的天气条件下,快速找到最好的路线。它用一种特别的数学魔法,把所有可能的风和障碍都压缩成几种简单的线索,让你不用记太多东西,就能预测出不同的路线。这就像你用一块魔法宝石,能看到不同的风向和路线变化,然后帮你选择最稳妥的路径。这样一来,无论天气多变,你都能用这张魔法地图,快速找到最安全、最有趣的路线,玩得更开心!
ELI14 Explained like you're 14
想象你在玩一个超级酷的游戏,你要在一个迷宫里找到出口,但迷宫里有风,有时候风会把你吹偏方向。以前的游戏只能告诉你一条固定的路线,但你知道,实际上迷宫的风会让你走出不同的路线。现在,这个新方法就像是给你一张神奇的地图,不仅告诉你怎么走,还能考虑到风的变化,帮你找到最稳妥的路线。它用一种特别的数学魔法,把风的变化压缩成几种简单的线索,让你不用记太多,就能预测出不同的路线。就像你用一块魔法宝石,能看到风的方向,然后帮你选择最安全的路径。这样一来,不管风多大,你都能用这张魔法地图,快速找到最安全、最有趣的路线,玩得更开心!
Glossary
Itô映射 (Itô map)
一种路径条件的随机流映射,将中间状态和Brownian路径映射到未来状态,支持任意步长的路径预测。
论文中提出的核心工具,用于路径生成和控制。
Brownian路径 (Brownian path)
连续随机过程,描述粒子在流体中的随机运动,具有高斯分布和独立增量特性。
作为随机路径的基础输入,经过低维特征提取用于路径预测。
Karhunen–Loève展开 (Karhunen–Loève expansion)
一种将无限维随机过程表示为有限模态的正交展开,便于路径压缩和特征提取。
用于将Brownian路径压缩成少数几个特征,提高模型效率。
Haar小波 (Haar wavelet)
一种多尺度的正交小波变换,用于局部分解随机路径,捕获不同尺度的变化。
实现路径的多尺度特征提取,增强模型的表达能力。
自我蒸馏 (Self-distillation)
一种训练策略,通过模型自身生成的中间结果作为监督信号,提升路径一致性和稳定性。
用于训练Itô映射,确保路径连续性。
贝叶斯路径估计 (BEL系列)
利用贝叶斯方法对路径进行后验估计,支持无梯度的控制策略。
在推断时实现奖励引导和路径调节。
梯度估计 (Itô-G, Itô-GF)
基于路径条件的梯度估计方法,用于奖励引导中的控制优化。
支持在奖励函数复杂或不可微情况下的路径控制。
路径一致性 (Path consistency)
确保不同时间点路径预测的连续性和符合动力学的特性。
训练目标之一,保证路径的合理性。
Sliced-Wasserstein距离 (SW2)
一种衡量高维分布相似性的指标,通过投影到一维后计算Wasserstein距离。
用于评估后验采样的质量。
Maximum Mean Discrepancy (MMD)
一种非参数统计距离,用于衡量两个分布的差异。
评估生成样本与真实分布的接近程度。
Open Questions Unanswered questions from this research
- 1 虽然引入了低维路径特征,但在极高维数据(如高分辨率图像)中的路径表达能力仍需提升,如何自动选择最优特征维度是未来研究方向。
- 2 模型在非高斯噪声环境下的表现尚未充分验证,特别是在复杂噪声模型和非线性动力学中,路径预测的鲁棒性有待提升。
- 3 路径的多样性与连续性之间存在平衡问题,如何在保证多样性的同时避免路径退化,是未来优化的关键。
- 4 训练过程中对路径一致性和模型稳定性的保证机制仍有改进空间,尤其在大规模数据集上训练的效率和效果。
- 5 未来还需探索多尺度、多模态路径特征的联合学习,以适应更复杂的生成任务和控制需求。
Applications
Immediate Applications
高效后验采样
利用Itô映射实现高维数据的快速后验采样,适用于图像生成、虚拟现实等场景,显著降低采样时间,提高样本多样性。
条件生成与引导控制
通过奖励函数调节生成样本的特定属性(如类别比例、风格等),在艺术创作、个性化推荐等领域具有广泛应用。
强化学习路径优化
结合路径控制策略,优化机器人或智能体的路径规划,实现自主决策和复杂任务执行。
Long-term Vision
自主智能系统
未来结合路径预测与强化学习,打造具有自主决策能力的智能系统,应用于自动驾驶、机器人等领域。
多模态多尺度路径建模
实现跨模态、多尺度的路径学习,支持复杂环境中的动态决策和多任务协作,推动智能系统的全面升级。
Abstract
Recent one-step generative models accelerate sampling by learning deterministic flow maps of the underlying dynamics. These methods rely on learning from ordinary differential equations, leaving open how to define an exact distillation procedure for stochastic dynamics. We introduce the Itô map, an any-step stochastic flow map that takes an intermediate state and Brownian path and predicts future states in a single pass. The Itô map formulation yields novel estimators for inference-time control by providing cheap, differentiable access to posterior samples. Empirically, Itô maps produce diverse, conditionally valid endpoint samples from fixed intermediate states and support strong steering performance on synthetic and image-generation benchmarks. These results establish any-step SDE integration as a useful primitive for posterior sampling and stochastic control.
References (20)
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
Xiaoshi Wu, Yiming Hao, Keqiang Sun et al.
Meta Flow Maps enable scalable reward alignment
Peter Potaptchik, A. Saravanan, Abbas Mammadov et al.
Diffusion Posterior Sampling for General Noisy Inverse Problems
Hyungjin Chung, Jeongsol Kim, Michael T. McCann et al.
Score-Based Generative Modeling through Stochastic Differential Equations
Yang Song, Jascha Narain Sohl-Dickstein, Diederik P. Kingma et al.
How to build a consistency model: Learning flow maps via self-distillation
N. Boffi, M. Albergo, Eric Vanden-Eijnden
Inference-Time Alignment in Diffusion Models with Reward-Guided Generation: Tutorial and Review
Masatoshi Uehara, Yulai Zhao, Chenyu Wang et al.
A stochastic control approach to reciprocal diffusion processes
P. Pra
One Step Diffusion via Shortcut Models
Kevin Frans, Danijar Hafner, Sergey Levine et al.
Sur la géométrie différentielle des groupes de Lie de dimension infinie et ses applications à l'hydrodynamique des fluides parfaits
V. Arnold
Stochastic Interpolants: A Unifying Framework for Flows and Diffusions
M. Albergo, N. Boffi, E. Vanden-Eijnden
Bayesian learning via neural Schrödinger–Föllmer flows
Francisco Vargas, Andrius Ovsianas, D. Fernandes et al.
Statistics in Function Space
D. D. Kosambi
Conditional brownian motion and the boundary limits of harmonic functions
J. Doob
Iterated Denoising Energy Matching for Sampling from Boltzmann Densities
Tara Akhound-Sadegh, Jarrid Rector-Brooks, A. Bose et al.
Decoupled MeanFlow: Turning Flow Models into Flow Maps for Accelerated Sampling
Kyungmin Lee, Sihyun Yu, Jinwoo Shin
Über lineare Methoden in der Wahrscheinlichkeitsrechnung
K. Karhunen
Denoising Diffusion Probabilistic Models
Jonathan Ho, Ajay Jain, P. Abbeel
Building Normalizing Flows with Stochastic Interpolants
M. Albergo, E. Vanden-Eijnden
Loss-Guided Diffusion Models for Plug-and-Play Controllable Generation
Jiaming Song, Qinsheng Zhang, Hongxu Yin et al.
Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion
Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao et al.