Improving Robotic Generalist Policies via Flow Reversal Steering

TL;DR

Flow Reversal Steering (FRS) leverages reverse flow models to map coarse actions into high-quality behaviors, boosting zero-shot control and rapid learning in robotic policies.

cs.RO 🔴 Advanced 2026-06-12 68 views

Andy Tang William Chen Andrew Wagenmaker Chelsea Finn Sergey Levine

AI Reader Arxiv Page Download PDF

robotics deep learning flow models policy optimization transfer learning

Key Findings

Methodology

This paper introduces Flow Reversal Steering (FRS), which employs the deterministic properties of flow models' ODEs to invert the forward denoising process. By integrating the flow model backward in time, FRS derives the latent noise vector corresponding to a given coarse reference action provided by humans or vision-language models (VLMs). This noise is then used as input to the forward flow model to generate refined, in-distribution actions that are semantically aligned with the high-level guidance. The approach effectively combines semantic reasoning with probabilistic generative modeling, enabling rapid adaptation and policy improvement without extensive trial-and-error RL. Extensive experiments on simulated datasets like LIBERO and real-world robotic platforms demonstrate that FRS can boost zero-shot success rates by up to 95%, facilitate fast behavioral cloning, and bootstrap reinforcement learning for complex tasks.

Key Results

In the LIBERO benchmark, FRS improved the success rate of a baseline vision-language-action policy (VLA) across 42 tasks by at least 10%, with some tasks seeing success rate jumps from below 2% to over 12%. This demonstrates the method's ability to convert coarse semantic cues into effective low-level actions, especially in challenging scenarios.
Using the noise vectors generated by FRS, the authors trained a behavioral cloning (BC) policy that achieved near-optimal performance within one minute, with success rates reaching 95% on 10 diverse tasks. Furthermore, integrating FRS into reinforcement learning (RL) frameworks (DSRL+FRS) led to significant improvements over standard RL, especially on tasks where the base policy nearly failed.
On the real robot DROID, FRS combined with VLMs enabled the robot to perform complex manipulation tasks, including object grasping, placement, and assembly, in cluttered and dynamic environments. These results validate the practical applicability of FRS in real-world settings, highlighting its robustness and scalability.

Significance

This work addresses a fundamental challenge in robotic policy generalization: how to effectively leverage rich behavioral priors encoded in large-scale foundation models for novel tasks. By enabling semantic guidance to steer probabilistic flow models, FRS bridges the gap between high-level reasoning and low-level control. Its ability to rapidly adapt, improve policies with minimal data, and incorporate semantic knowledge from VLMs or humans marks a significant advancement in autonomous robot learning. This approach reduces reliance on extensive data collection and trial-and-error RL, paving the way for more flexible, scalable, and intelligent robotic systems capable of operating in unstructured environments. The broader impact extends to industrial automation, service robotics, and human-robot interaction, where rapid adaptation and semantic understanding are crucial.

Technical Contribution

The core technical contribution of this paper is the development of Flow Reversal Steering (FRS), which innovatively applies reverse integration of the flow model's ODE to invert the denoising process. This allows the extraction of latent noise vectors from coarse reference actions, effectively transforming high-level semantic cues into low-level control signals. FRS seamlessly integrates with existing flow-based policies, enabling both zero-shot steering and policy refinement via behavioral cloning and reinforcement learning. The method leverages the deterministic nature of flow models to perform efficient, gradient-based inversion, avoiding the computationally expensive trial-and-error typical of prior RL-based noise search. Additionally, FRS facilitates the use of semantic reasoning from VLMs and humans, making it a versatile tool for multi-modal guidance. The approach introduces a new paradigm for combining probabilistic generative models with symbolic and semantic inference, significantly enhancing the flexibility and scalability of robotic control systems.

Novelty

This work is the first to systematically utilize the invertibility of flow models' ODEs for policy steering in robotics, specifically through reverse flow integration to refine coarse semantic actions. Unlike previous approaches that rely on trial-and-error RL to find suitable noise vectors or interpolate reference actions with Gaussian noise, FRS directly computes the latent noise corresponding to high-level guidance, enabling precise and semantically meaningful action generation. The integration of semantic reasoning with flow model inversion represents a novel fusion of probabilistic generative modeling and symbolic inference, opening new avenues for data-efficient, adaptable robotic control. This approach also extends the application scope of flow models beyond image synthesis and into real-time robot manipulation, marking a significant leap forward in the field.

Limitations

The accuracy of reverse flow integration diminishes in highly dynamic or complex environments, where the approximation errors in backward ODE solving can lead to suboptimal or unstable actions.
FRS heavily depends on the quality and coverage of the pretrained flow models and semantic reasoners; if these models lack sufficient representational capacity or are biased, the steering effectiveness may be compromised.
Real-time deployment may face computational bottlenecks, especially when integrating complex VLMs or performing multiple reverse integrations per step, necessitating further optimization for practical use.

Future Work

Future research will focus on enhancing the robustness of flow inversion in more complex, dynamic scenarios, possibly through adaptive integration schemes or learned inverse models. Expanding the semantic reasoning capabilities to include richer contextual understanding and multi-modal inputs will further improve guidance quality. Additionally, integrating FRS with hierarchical control architectures and multi-robot systems could unlock new levels of autonomy. Developing more efficient algorithms for real-time inverse flow computation and exploring unsupervised or self-supervised training paradigms will be crucial for scaling this approach to industrial applications and long-term autonomous operation.

AI Executive Summary

Robotic systems have made significant strides with the advent of large-scale foundation models trained on diverse datasets, enabling multi-task generalist policies capable of following a wide array of commands. However, these policies often struggle when faced with novel or complex tasks that diverge from their training data, especially in real-world environments where trial-and-error learning is costly and time-consuming. Traditional solutions involve collecting more demonstration data or extensive reinforcement learning, both of which are resource-intensive and slow.

This paper introduces Flow Reversal Steering (FRS), a novel approach that leverages the invertibility of flow models to guide robot policies using high-level semantic cues. By performing backward integration of the flow model’s ODE, FRS can infer the latent noise vector corresponding to a coarse, semantically-guided action provided by humans or vision-language models (VLMs). This noise is then used as input to the forward flow model, producing refined, in-distribution actions that are both semantically aligned and fine-grained.

The core idea hinges on the deterministic nature of flow models: reversing the denoising process allows the system to map a rough instruction into a plausible low-level action. This process effectively bridges the gap between symbolic, high-level reasoning and continuous control, enabling robots to interpret and execute complex commands with minimal supervision. The authors demonstrate that FRS can significantly improve zero-shot control success rates, boosting performance by up to 95% on challenging manipulation tasks in the LIBERO benchmark.

Furthermore, FRS facilitates rapid policy learning through behavioral cloning (BC). By treating the inferred noise vectors as expert demonstrations, the authors train auxiliary policies that can quickly adapt to new tasks within a minute, achieving success rates comparable to fully trained policies. When integrated into reinforcement learning frameworks, FRS provides a powerful prior that accelerates exploration and policy refinement, enabling robots to master tasks that standard RL approaches fail to improve.

Experiments on real-world robots validate the practicality of FRS, showing effective manipulation in cluttered and dynamic scenes. The method’s ability to incorporate semantic knowledge from VLMs and human instructions makes it highly versatile and scalable. Overall, FRS represents a significant advance in robotic control, combining probabilistic generative modeling with semantic reasoning to enable fast, flexible, and robust autonomous behavior. Its potential applications span industrial automation, service robotics, and human-robot collaboration, promising a future where robots can learn and adapt with minimal supervision in complex environments.

Deep Dive

Abstract

Generalist policies can learn a wide range of skills from diverse robot datasets. In order to solve or improve on challenging news tasks, we need a way to infer and invoke the appropriate actions from the policy's rich behavioral prior, especially when directly commanding the policy fails. We focus on flow matching generalists and propose Flow Reversal Steering (FRS): a method that takes suboptimal but ``reasonable'' actions, finds their latent noises by passing them through the flow policy in reverse, and maps them to nearby generalist action modes. We evaluate FRS across many simulated and real-world manipulation settings. First, FRS can turn coarse semantic guidance from humans or vision-language models (VLMs) into corresponding good robot actions, improving zero-shot control. These gains can be distilled with behavioral cloning by training an auxiliary policy to output noises that the generalist maps to good actions -- showing up to 95% absolute task success rate boosts in under a minute of training. Finally, FRS enables policy improvement by bootstrapping reinforcement learning with semantic knowledge, improving on several tasks that standard RL fails to improve on.

cs.RO

References (20)

Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance

Mitsuhiko Nakamoto, Oier Mees, Aviral Kumar et al.

2024 88 citations ⭐ Influential View Analysis →

To the Noise and Back: Diffusion for Shared Autonomy

Takuma Yoneda, Luzhe Sun, Ge Yang et al.

2023 43 citations ⭐ Influential View Analysis →

PolaRiS: Scalable Real-to-Sim Evaluations for Generalist Robot Policies

Arhan Jain, Mingtong Zhang, Kanav Arora et al.

2025 17 citations ⭐ Influential View Analysis →

LARGE SCALE

1991 271 citations ⭐ Influential

π0.5: a Vision-Language-Action Model with Open-World Generalization

Physical Intelligence, Kevin Black, Noah Brown et al.

2025 1074 citations ⭐ Influential View Analysis →

Reinforcement Learning with Action Chunking

Qiyang Li, Zhiyuan Zhou, Sergey Levine

2025 69 citations ⭐ Influential View Analysis →

Self-Improving Vision-Language-Action Models with Data Generation via Residual RL

Wenli Xiao, Haotian Lin, Andy Peng et al.

2025 40 citations ⭐ Influential View Analysis →

Steering Your Diffusion Policy with Latent Space Reinforcement Learning

Andrew Wagenmaker, Mitsuhiko Nakamoto, Yunchu Zhang et al.

2025 109 citations ⭐ Influential View Analysis →

Residual Reinforcement Learning for Robot Control

T. Johannink, Shikhar Bahl, Ashvin Nair et al.

2018 579 citations View Analysis →

Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation

Narek Tumanyan, Michal Geyer, Shai Bagon et al.

2022 1064 citations View Analysis →

CoNVOI: Context-aware Navigation using Vision Language Models in Outdoor and Indoor Environments

A. Sathyamoorthy, Kasun Weerakoon, Mohamed Bashir Elnoor et al.

2024 58 citations View Analysis →

MEDIC: Zero-shot Music Editing with Disentangled Inversion Control

Huadai Liu, Jialei Wang, Rongjie Huang et al.

2024 12 citations View Analysis →

Stable Flow: Vital Layers for Training-Free Image Editing

Omri Avrahami, Or Patashnik, Ohad Fried et al.

2024 93 citations View Analysis →

LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning

Bo Liu, Yifeng Zhu, Chongkai Gao et al.

2023 995 citations View Analysis →

Null-text Inversion for Editing Real Images using Guided Diffusion Models

Ron Mokady, Amir Hertz, Kfir Aberman et al.

2022 1316 citations View Analysis →

Code as Policies: Language Model Programs for Embodied Control

Jacky Liang, Wenlong Huang, F. Xia et al.

2022 1581 citations View Analysis →

Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware

Tony Zhao, Vikash Kumar, S. Levine et al.

2023 1801 citations View Analysis →

LIBERO-X: Robustness Litmus for Vision-Language-Action Models

Guodong Wang, Chenkai Zhang, Qingjie Liu et al.

2026 4 citations View Analysis →

PROGRESSLM: Towards Progress Reasoning in Vision-Language Models

Jianshu Zhang, Chengxuan Qian, Haosen Sun et al.

2026 6 citations View Analysis →

Taming Rectified Flow for Inversion and Editing

Jiangshan Wang, Junfu Pu, Zhongang Qi et al.

2024 185 citations View Analysis →

Improving Robotic Generalist Policies via Flow Reversal Steering

Key Findings

Methodology

Key Results

Significance

Technical Contribution

Novelty

Limitations

Future Work

AI Executive Summary

Deep Dive

Abstract

References (20)

Related Papers

Increasing Resilience of Continuum Robots via Motion Planning Algorithms

ARC: Adaptive Robust Joint State and Covariance Estimation

Do as I Do: Dexterous Manipulation Data from Everyday Human Videos

Observability and Consistency Analysis for Visual-Inertial Navigation with Anchored Feature Parameterizations

Visual Verification Enables Inference-time Steering and Autonomous Policy Improvement

R2RDreamer: 3D-aware Data Augmentation for Spatially-generalized 2D Manipulation Policies