Temporal Straightening for Latent Planning

TL;DR

Temporal Straightening improves latent planning success rates by 20-60% using curvature regularization.

cs.LG 🔴 Advanced 2026-03-13 15 views

Ying Wang Oumayma Bounou Gaoyue Zhou Randall Balestriero Tim G. J. Rudner Yann LeCun Mengye Ren

latent planning temporal straightening curvature regularization representation learning goal-reaching tasks

Key Findings

Methodology

The paper introduces a novel representation learning method called Temporal Straightening, which optimizes the linearity of latent trajectories through curvature regularization. This method jointly trains an encoder and a predictor, making the Euclidean distance in latent space a better proxy for geodesic distance, thereby improving the conditioning of the planning objective. Specifically, curvature regularization is used to encourage locally straightened latent trajectories, enhancing the stability of gradient-based planning.

Key Results

Temporal Straightening significantly improves success rates across a suite of goal-reaching tasks. Experimental results show that open-loop planning success improves by 20-60%, and MPC by 20-30%. These results demonstrate the method's effectiveness across different tasks, especially in high-dimensional observation environments.
In the UMaze experiment, Temporal Straightening results in smoother trajectories from the top left to the top right, with Euclidean distances more faithfully reflecting geodesic progress. By reducing the curvature of latent trajectories, the experiments show a significant improvement in the conditioning of the planning objective.
Comparisons of different encoder and predictor architectures reveal that models trained from scratch with ResNet exhibit superior curvature reduction, further validating the effectiveness of the Temporal Straightening method.

Significance

Temporal Straightening is significant in the field of latent planning as it not only enhances planning stability and success rates but also offers a new perspective on representation learning. By reducing the curvature of latent trajectories, the method makes Euclidean distance a better proxy for geodesic distance, thereby improving the conditioning of planning objectives. This approach provides new insights for developing latent world models, particularly in high-dimensional observation environments, effectively reducing computational burdens and latency.

Technical Contribution

The technical contribution of this paper lies in proposing a novel curvature regularization method to optimize the linearity of latent trajectories. Unlike existing reconstruction objectives, this method emphasizes the sufficiency of dynamic prediction without task-irrelevant information. By jointly training an encoder and a predictor, Temporal Straightening significantly improves the geometric structure of latent space, making gradient-based planning more efficient. The method is not only theoretically validated but also empirically proven to be superior across multiple experiments.

Novelty

The innovation of Temporal Straightening lies in introducing curvature regularization to optimize the linearity of latent trajectories. Inspired by the perceptual straightening hypothesis in human vision, this method is applied in latent planning for the first time. Unlike previous reconstruction objectives, Temporal Straightening focuses more on the sufficiency of dynamic prediction, avoiding interference from task-irrelevant information.

Limitations

In long-horizon planning tasks, prediction errors may accumulate, leading to trajectory drift. This issue is particularly evident in long rollouts and requires further research.
In complex dynamic environments, Temporal Straightening may require higher computational resources to achieve real-time planning.
Although Temporal Straightening performs well in multiple tasks, its performance in more complex 3D environments remains to be verified.

Future Work

Future research directions include verifying the effectiveness of Temporal Straightening in more complex 3D environments and exploring its applications in robotic planning. Additionally, reducing the accumulation of prediction errors in long-horizon planning tasks is a worthwhile research topic. The community can further explore how to combine other representation learning methods to enhance the robustness and adaptability of Temporal Straightening.

AI Executive Summary

In latent planning, learning good representations is crucial. However, pretrained visual encoders, while capable of producing strong semantic visual features, are not tailored for planning and may contain information irrelevant or even detrimental to planning. Inspired by the perceptual straightening hypothesis in human visual processing, this paper introduces Temporal Straightening to improve representation learning for latent planning.

Temporal Straightening optimizes the linearity of latent trajectories through curvature regularization. Specifically, the paper jointly trains an encoder and a predictor, using curvature regularization to encourage locally straightened latent trajectories. Experimental results show that this method makes the Euclidean distance in latent space a better proxy for geodesic distance, thereby improving the conditioning of the planning objective.

Across a suite of goal-reaching tasks, Temporal Straightening significantly improves success rates. Open-loop planning success improves by 20-60%, and MPC by 20-30%. These results demonstrate the method's effectiveness across different tasks, especially in high-dimensional observation environments.

Despite its superior performance in multiple tasks, Temporal Straightening faces challenges in long-horizon planning tasks where prediction errors may accumulate, leading to trajectory drift. Additionally, in complex dynamic environments, the method may require higher computational resources to achieve real-time planning. Future research directions include verifying the effectiveness of Temporal Straightening in more complex 3D environments and exploring its applications in robotic planning.

Deep Analysis

Background

Latent planning has emerged as a significant research direction in machine learning. By compressing high-dimensional observations into compact latent representations, latent planning improves efficiency and generalization. Early visual world models directly predicted in pixel spaces and used generated images for control. However, as research progressed, more methods began encoding high-dimensional sensory inputs into compact latent representations and planning in the resulting latent space. Existing methods typically add reconstruction-based objectives when training the encoder, but these objectives often overemphasize low-level visual details and may fail to capture task-relevant information. Recent approaches decouple perception from dynamics by leveraging strong pretrained visual encoders, yet these encoders are not optimized for planning and may lead to challenging planning objectives.

Core Problem

Optimizing the learned latent space remains challenging in latent planning. The induced planning objective is typically highly non-convex, potentially causing gradient-based optimizers to struggle. Moreover, commonly used goal cost metrics based on Euclidean distance can be misleading if the embedding space is not properly regularized. In particular, when latent trajectories are highly curved, straight-line distances in embedding space misrepresent the geodesic distance along feasible transitions. These challenges call for better representations tailored for latent planning.

Innovation

Methodology

�� Temporal Straightening optimizes the linearity of latent trajectories through curvature regularization.
�� Jointly trains an encoder and a predictor, making the Euclidean distance in latent space a better proxy for geodesic distance.
�� Uses curvature regularization to encourage locally straightened latent trajectories, enhancing the stability of gradient-based planning.
�� Significantly improves success rates across a suite of goal-reaching tasks.

Experiments

The experimental design includes evaluating planning performance across four environments: Wall, PointMaze UMaze, a medium-sized maze, and PushT. The experiments use frozen DINOv2 spatial features or CLS tokens. Following DINO-WM's setup, all environments use a frameskip of 5. Details on the environments and experiments are provided in the appendix of the paper. The results demonstrate the method's effectiveness across different tasks, especially in high-dimensional observation environments.

Results

Experimental results show that open-loop planning success improves by 20-60%, and MPC by 20-30%. These results demonstrate the method's effectiveness across different tasks, especially in high-dimensional observation environments. Comparisons of different encoder and predictor architectures reveal that models trained from scratch with ResNet exhibit superior curvature reduction, further validating the effectiveness of the Temporal Straightening method.

Applications

Limitations & Outlook

Plain Language Accessible to non-experts

Imagine you're in a maze, trying to find the shortest path to the goal. Traditional methods might have you wandering around because they can't accurately judge the distance of each step. Temporal Straightening is like giving you a special pair of glasses that lets you see the most direct path through the maze. By reducing the twists and turns in the path, this method helps you reach the goal faster and more accurately. Just like walking through a maze, Temporal Straightening helps you find the optimal path in complex environments without being distracted by unnecessary details. This method not only improves planning efficiency but also reduces computational complexity, allowing you to make decisions faster.

ELI14 Explained like you're 14

Hey there! Imagine you're playing a maze game, and you need to find the quickest path to the finish line. Traditional methods might have you wandering around because they can't accurately judge the distance of each step. Temporal Straightening is like giving you a super pair of glasses that lets you see the most direct path through the maze. By reducing the twists and turns in the path, this method helps you reach the finish line faster and more accurately. Just like walking through a maze, Temporal Straightening helps you find the optimal path in complex environments without being distracted by unnecessary details. This method not only improves planning efficiency but also reduces computational complexity, allowing you to make decisions faster. Isn't that cool?

Glossary

Latent Planning

A planning method that improves efficiency and generalization by compressing high-dimensional observations into compact latent representations.

In this paper, latent planning is used to optimize trajectories in latent space.

Temporal Straightening

A method that optimizes the linearity of latent trajectories through curvature regularization.

Temporal Straightening is used to improve the geometric structure of latent space.

Curvature Regularization

A method that optimizes representation learning by reducing the curvature of latent trajectories.

Curvature regularization is used to encourage locally straightened latent trajectories.

Euclidean Distance

A geometric method for measuring the straight-line distance between two points.

In latent space, Euclidean distance is used to reflect geodesic distance.

Geodesic Distance

The shortest distance along feasible paths.

Geodesic distance is used to evaluate path lengths in latent space.

Latent Space

A compact representation space obtained by compressing high-dimensional observations.

Latent space is used for planning and optimization.

Representation Learning

A method that improves model performance by learning effective data representations.

Representation learning is used to optimize trajectories in latent space.

Goal-reaching Tasks

Tasks that require finding the optimal path to reach a goal.

Goal-reaching tasks are used to evaluate the performance of Temporal Straightening.

Gradient-based Planning

A method that achieves planning by optimizing gradients.

Gradient-based planning is used to optimize trajectories in latent space.

Non-convex Optimization

An optimization problem where the objective function is not convex and may have multiple local optima.

Non-convex optimization is a challenge in latent planning.

Open Questions Unanswered questions from this research

1 Despite its superior performance in multiple tasks, Temporal Straightening faces challenges in long-horizon planning tasks where prediction errors may accumulate, leading to trajectory drift. This issue is particularly evident in long rollouts and requires further research.
2 In complex dynamic environments, Temporal Straightening may require higher computational resources to achieve real-time planning. This issue needs further research to improve the method's efficiency and adaptability.
3 The performance of Temporal Straightening in more complex 3D environments remains to be verified. This issue requires further research to expand the method's application scope.
4 Reducing the accumulation of prediction errors in long-horizon planning tasks is a worthwhile research topic. This issue requires further research to enhance the method's robustness.
5 Exploring how to combine other representation learning methods to enhance the robustness and adaptability of Temporal Straightening is a worthwhile research topic.

Applications

Immediate Applications

Robotic Path Planning

Temporal Straightening can be used in robotic path planning to help robots find optimal paths in complex environments. By reducing path curvature, robots can reach their goals faster and more accurately.

Autonomous Driving

In autonomous driving, Temporal Straightening can help vehicles find optimal paths in complex urban environments. By reducing path curvature, vehicles can make decisions faster and improve driving efficiency.

Game AI

In game AI, Temporal Straightening can help characters find optimal paths in complex game environments. By reducing path curvature, characters can complete tasks faster and improve the gaming experience.

Long-term Vision

Smart City Traffic Management

Temporal Straightening can be used in smart city traffic management to help optimize traffic flow and reduce congestion. By reducing path curvature, traffic management systems can allocate resources more efficiently and improve traffic efficiency.

Intelligent Navigation in Complex Environments

In complex environments, Temporal Straightening can help agents find optimal paths and improve navigation efficiency. By reducing path curvature, agents can adapt to environmental changes faster and improve task completion rates.

Abstract

Learning good representations is essential for latent planning with world models. While pretrained visual encoders produce strong semantic visual features, they are not tailored to planning and contain information irrelevant -- or even detrimental -- to planning. Inspired by the perceptual straightening hypothesis in human visual processing, we introduce temporal straightening to improve representation learning for latent planning. Using a curvature regularizer that encourages locally straightened latent trajectories, we jointly learn an encoder and a predictor. We show that reducing curvature this way makes the Euclidean distance in latent space a better proxy for the geodesic distance and improves the conditioning of the planning objective. We demonstrate empirically that temporal straightening makes gradient-based planning more stable and yields significantly higher success rates across a suite of goal-reaching tasks.

cs.LG

References (20)

Navigation World Models

Amir Bar, Gaoyue Zhou, Danny Tran et al.

2024 172 citations ⭐ Influential View Analysis →

Diffusion policy: Visuomotor policy learning via action diffusion

Cheng Chi, S. Feng, Yilun Du et al.

2023 2622 citations ⭐ Influential View Analysis →

DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning

Gaoyue Zhou, Hengkai Pan, Yann LeCun et al.

2024 148 citations ⭐ Influential View Analysis →

DINOv3

Oriane Sim'eoni, Huy V. Vo, Maximilian Seitzer et al.

2025 439 citations ⭐ Influential View Analysis →

Deep Residual Learning for Image Recognition

Kaiming He, X. Zhang, Shaoqing Ren et al.

2015 222724 citations ⭐ Influential View Analysis →

Optimization of computer simulation models with rare events

R. Rubinstein

1997 780 citations ⭐ Influential

Linear Systems

2010 1205 citations ⭐ Influential

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

Mahmoud Assran, Adrien Bardes, David Fan et al.

2025 231 citations View Analysis →

Mastering Atari with Discrete World Models

Danijar Hafner, T. Lillicrap, Mohammad Norouzi et al.

2020 1116 citations View Analysis →

Momentum Contrast for Unsupervised Visual Representation Learning

Kaiming He, Haoqi Fan, Yuxin Wu et al.

2019 14452 citations View Analysis →

AI-Generated Video Detection via Perceptual Straightening

Christian Internò, Robert Geirhos, Markus Olhofer et al.

2025 9 citations View Analysis →

Prediction, Consistency, Curvature: Representation Learning for Locally-Linear Control

Nir Levine, Yinlam Chow, Rui Shu et al.

2019 35 citations View Analysis →

Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models

Vlad Sobal, Wancong Zhang, Kynghyun Cho et al.

2025 28 citations View Analysis →

TCLR: Temporal Contrastive Learning for Video Representation

I. Dave, Rohit Gupta, Mamshad Nayeem Rizve et al.

2021 217 citations View Analysis →

Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images

Manuel Watter, Jost Tobias Springenberg, J. Boedecker et al.

2015 892 citations View Analysis →

Chirality in Action: Time-Aware Video Representation Learning by Latent Straightening

Piyush Bagad, Andrew Zisserman

2025 6 citations View Analysis →

Variance-Covariance Regularization Improves Representation Learning

Jiachen Zhu, Ravid Shwartz-Ziv, Yubei Chen et al.

2023 11 citations View Analysis →

Mastering Diverse Domains through World Models

Danijar Hafner, J. Pašukonis, Jimmy Ba et al.

2023 947 citations View Analysis →

Neural Discrete Representation Learning

Aäron van den Oord, O. Vinyals, K. Kavukcuoglu

2017 6778 citations View Analysis →

Mathematical Control Theory: Deterministic Finite Dimensional Systems

Eduardo Sontag

1990 3604 citations

Temporal Straightening for Latent Planning

Key Findings

Methodology

Key Results

Significance

Technical Contribution

Novelty

Limitations

Future Work

AI Executive Summary

Deep Analysis

Background

Core Problem

Innovation

Methodology

Experiments

Results

Applications

Limitations & Outlook

Plain Language Accessible to non-experts

ELI14 Explained like you're 14

Glossary

Latent Planning

Temporal Straightening

Curvature Regularization

Euclidean Distance

Geodesic Distance

Latent Space

Representation Learning

Goal-reaching Tasks

Gradient-based Planning

Non-convex Optimization

Open Questions Unanswered questions from this research

Applications

Immediate Applications

Robotic Path Planning

Autonomous Driving

Game AI

Long-term Vision

Smart City Traffic Management

Intelligent Navigation in Complex Environments

Abstract

References (20)

Related Papers

PhysMoDPO: Physically-Plausible Humanoid Motion with Preference Optimization

Representation Learning for Spatiotemporal Physical Systems

Learnability and Privacy Vulnerability are Entangled in a Few Critical Weights

MXNorm: Reusing MXFP block scales for efficient tensor normalisation

ZO-SAM: Zero-Order Sharpness-Aware Minimization for Efficient Sparse Training

BoSS: A Best-of-Strategies Selector as an Oracle for Deep Active Learning