Representation Learning for Spatiotemporal Physical Systems

TL;DR

Using Joint Embedding Predictive Architectures (JEPA) for learning representations in latent space significantly enhances parameter estimation accuracy.

cs.LG 🔴 Advanced 2026-03-14 2 views

Helen Qu Rudy Morel Michael McCabe Alberto Bietti François Lanusse Shirley Ho Yann LeCun

AI Reader Arxiv Page Download PDF

self-supervised learning physical modeling latent space parameter estimation machine learning

Key Findings

Methodology

This paper introduces a self-supervised learning method based on Joint Embedding Predictive Architectures (JEPA), which predicts in the latent space rather than at the pixel level like traditional methods. By minimizing errors in the representation space, JEPA captures high-level information about physical systems. Specifically, JEPA encodes and predicts temporal sequence samples, optimizing the VICReg loss function to prevent mode collapse.

Key Results

JEPA achieved a mean squared error (MSE) of 0.16 on the active matter dataset, a 51% improvement over VideoMAE's 0.67. It also improved by 43% and 28% on shear flow and Rayleigh-Bénard convection datasets, respectively.
In experiments on the shear flow parameter prediction task, JEPA achieved an MSE of 0.4 with just 50% of the fine-tuning data, approaching the best performance, while VideoMAE had an MSE of 0.67 with 100% data.
Compared to physical modeling methods like DISCO and MPP, JEPA performed excellently on the active matter dataset with an MSE of 0.057, while on the Rayleigh-Bénard convection dataset, DISCO had an MSE of 0.01 compared to JEPA's 0.13.

Significance

This study demonstrates the advantages of latent space prediction in self-supervised learning for physical systems. Compared to traditional pixel-level prediction and autoregressive models, latent space prediction not only excels in parameter estimation tasks but also offers superior sample efficiency. These findings provide new insights for scientific machine learning, suggesting that decoupling physical relevance from generative fidelity could be a promising research direction.

Technical Contribution

Technically, the JEPA method proposed in this paper predicts in the latent space, offering new theoretical guarantees and engineering possibilities compared to existing autoregressive and pixel-level prediction methods. By minimizing errors in the representation space, JEPA effectively captures high-level information about physical systems, excelling in parameter estimation tasks.

Novelty

JEPA is the first self-supervised learning method to predict physical systems in the latent space. Unlike previous physical modeling methods, JEPA does not rely on pixel-level details but improves prediction accuracy by capturing high-level information.

Limitations

JEPA's performance on the Rayleigh-Bénard convection dataset is inferior to DISCO, possibly due to limited generalization capabilities in certain complex physical phenomena.
Although JEPA excels in parameter estimation tasks, its performance in other types of scientific tasks has not been verified.
JEPA's training process still requires significant computational resources, which may limit its application in resource-constrained environments.

Future Work

Future research directions include exploring JEPA's application in other scientific tasks, such as qualitative prediction and complex system behavior analysis. Further optimization of JEPA's computational efficiency and generalization capabilities is also an important research direction.

AI Executive Summary

Understanding and predicting the evolution of physical systems has long been a challenging problem in scientific research. Traditional machine learning methods often rely on autoregressive models, which simulate the system's evolution through frame-by-frame prediction. However, this approach is not only computationally expensive but also prone to cumulative errors in long-term predictions.

This paper proposes a novel method, Joint Embedding Predictive Architecture (JEPA), which significantly improves the accuracy of physical parameter estimation by predicting in the latent space. Unlike traditional pixel-level prediction methods, JEPA captures high-level information about physical systems by minimizing errors in the representation space.

In experiments, JEPA performed excellently across multiple physical system datasets. On the active matter dataset, JEPA achieved a mean squared error (MSE) of 0.16, significantly outperforming VideoMAE's 0.67. Additionally, JEPA improved by 43% and 28% on shear flow and Rayleigh-Bénard convection datasets, respectively.

This research is not only significant in academia but also provides new insights for the industry. By applying self-supervised learning to physical systems, JEPA demonstrates the advantages of latent space prediction, suggesting that decoupling physical relevance from generative fidelity could be a promising research direction.

However, JEPA's performance in certain complex physical phenomena still needs improvement, and future research could explore its application in other scientific tasks. Additionally, further optimization of JEPA's computational efficiency and generalization capabilities is an important research direction.

Deep Analysis

Background

In recent years, significant progress has been made in the application of machine learning to physical systems. Traditional methods mainly focus on autoregressive models, which simulate the system's evolution through frame-by-frame prediction. However, this approach is computationally expensive and prone to cumulative errors in long-term predictions. Additionally, pixel-level prediction methods, while capable of capturing detailed information, are insufficient in extracting high-level physical information. Thus, effectively learning and representing high-level information in physical systems has become an important research direction.

Core Problem

The evolution of physical systems is often described by complex partial differential equations (PDEs), making precise simulation challenging. While traditional autoregressive models can simulate the system's evolution, they are prone to cumulative errors in long-term predictions. Additionally, pixel-level prediction methods, while capable of capturing detailed information, are insufficient in extracting high-level physical information. Thus, effectively learning and representing high-level information in physical systems has become an important research direction.

Innovation

The Joint Embedding Predictive Architecture (JEPA) proposed in this paper predicts in the latent space, offering several innovations compared to traditional pixel-level prediction methods:

1. JEPA captures high-level information about physical systems by minimizing errors in the representation space.

2. JEPA excels across multiple physical system datasets, significantly improving the accuracy of physical parameter estimation.

3. JEPA also offers superior sample efficiency, achieving good performance with less fine-tuning data.

Methodology

The methodology of this paper includes the following key steps:

�� Use Joint Embedding Predictive Architecture (JEPA) to predict in the latent space.
�� JEPA captures high-level information about physical systems by minimizing errors in the representation space.
�� Encode and predict temporal sequence samples, optimizing the VICReg loss function to prevent mode collapse.
�� Conduct experiments across multiple physical system datasets to validate JEPA's performance.

Experiments

The experimental design includes the following aspects:

�� Datasets: active matter, shear flow, and Rayleigh-Bénard convection.
�� Baselines: VideoMAE, DISCO, MPP.
�� Evaluation metric: mean squared error (MSE).
�� Key hyperparameters: hyperparameters λ, µ, ν for the VICReg loss function.

Results

Experimental results show that JEPA performs excellently across multiple physical system datasets. On the active matter dataset, JEPA achieved a mean squared error (MSE) of 0.16, significantly outperforming VideoMAE's 0.67. Additionally, JEPA improved by 43% and 28% on shear flow and Rayleigh-Bénard convection datasets, respectively. These results indicate that JEPA effectively captures high-level information about physical systems.

Applications

Application scenarios for JEPA in physical systems include:

�� Parameter estimation: By predicting in the latent space, JEPA significantly improves the accuracy of physical parameter estimation.
�� Qualitative prediction: JEPA captures high-level information about physical systems, aiding in qualitative prediction of system behavior.
�� Complex system analysis: JEPA's excellent performance across multiple physical system datasets indicates its potential in complex system analysis.

Limitations & Outlook

Although JEPA performs excellently across multiple physical system datasets, its performance in certain complex physical phenomena still needs improvement. Additionally, JEPA's training process still requires significant computational resources, which may limit its application in resource-constrained environments. Future research could explore JEPA's application in other scientific tasks and further optimize its computational efficiency and generalization capabilities.

Plain Language Accessible to non-experts

Imagine you're cooking in a kitchen. Traditional machine learning methods are like following a recipe step-by-step, requiring precise instructions and a lot of time. The JEPA method is like having mastered the basic principles of cooking, allowing you to adapt flexibly based on the ingredients, quickly creating delicious dishes. JEPA predicts in the latent space, like constructing a high-level concept of the dish in your mind rather than focusing on every detail. This way, you can not only make the dish faster but also adjust according to different ingredients and conditions, creating dishes that better suit your taste. This approach is equally applicable in physical systems, where JEPA can quickly capture high-level information, improving the accuracy of parameter estimation.

ELI14 Explained like you're 14

Hey there, young friend! Imagine you're playing a super complex game with many levels, each with different challenges. Traditional methods are like starting from scratch every time, solving each small problem step-by-step, taking a lot of time. The JEPA method is like having mastered the core skills of the game, allowing you to quickly pass levels and adapt flexibly to new challenges. JEPA predicts in the latent space, like finding a hidden shortcut in the game, allowing you to reach the finish line faster. This method not only helps you perform better in the game but also gives you more confidence when facing new challenges!

Glossary

Joint Embedding Predictive Architecture (JEPA)

A self-supervised learning method that predicts in the latent space by minimizing errors in the representation space to capture high-level information about physical systems.

In this paper, JEPA is used to improve the accuracy of physical parameter estimation.

Self-supervised learning

A learning method that does not require manual annotations, learning useful representations from data by designing pretext tasks.

Self-supervised learning is used in this paper to capture high-level information about physical systems.

Latent space

A low-dimensional representation of data, often used to capture high-level features of the data.

JEPA predicts in the latent space to improve the accuracy of parameter estimation.

Mean squared error (MSE)

A metric for evaluating the performance of prediction models by calculating the squared difference between predicted and true values.

MSE is used in this paper to evaluate JEPA's performance in parameter estimation tasks.

Active matter

A physical system consisting of active particles moving in a fluid, forming complex collective dynamics.

The active matter dataset is used to evaluate JEPA's performance.

Shear flow

A fluid dynamics phenomenon where layers of fluid move parallel to each other at different velocities, potentially leading to vortex or turbulence.

The shear flow dataset is used to evaluate JEPA's performance in parameter estimation tasks.

Rayleigh-Bénard convection

A thermal convection phenomenon where fluid layers form convection cells due to a temperature gradient.

The Rayleigh-Bénard convection dataset is used to test JEPA's performance.

VICReg loss function

A loss function used to prevent mode collapse by minimizing errors in the representation space, improving model performance.

JEPA optimizes the VICReg loss function to capture high-level information about physical systems.

VideoMAE

A self-supervised learning method based on masked autoencoding, primarily used for feature learning from video data.

VideoMAE is used as a baseline model in this paper to compare with JEPA.

DISCO

A learning method for multi-physics prediction that improves prediction accuracy by learning evolution operators.

DISCO is used as a baseline for physical modeling methods in this paper.

Open Questions Unanswered questions from this research

1 How can JEPA be applied to more complex physical systems? Current research mainly focuses on relatively simple physical systems, and JEPA's performance in more complex systems has not been verified. Future research needs to explore JEPA's application in these systems and optimize its generalization capabilities.
2 How does JEPA perform in other scientific tasks? Although JEPA excels in parameter estimation tasks, its performance in other types of scientific tasks has not been verified. Future research could explore JEPA's application in qualitative prediction and complex system analysis.
3 How can JEPA's computational efficiency be improved? Despite JEPA's excellent performance, its training process still requires significant computational resources. Future research could explore optimizing JEPA's computational efficiency for application in resource-constrained environments.
4 How does JEPA's performance vary across different datasets? The experiments in this paper mainly focus on specific datasets, and JEPA's performance on other types of datasets may differ. Future research could explore JEPA's performance on different datasets and analyze the reasons.
5 How can the VICReg loss function be further optimized? The VICReg loss function performs well in preventing mode collapse, but further optimization may be needed in some cases. Future research could explore methods to optimize the VICReg loss function to improve JEPA's performance.

Applications

Immediate Applications

Physical Parameter Estimation

JEPA significantly improves the accuracy of physical parameter estimation, applicable to scientific research and engineering applications requiring high-precision parameter estimation.

Complex System Analysis

By predicting in the latent space, JEPA captures high-level information about complex systems, aiding in the analysis of complex system behavior.

Qualitative Prediction

JEPA captures high-level information about physical systems, aiding in qualitative prediction of system behavior, suitable for scientific research requiring qualitative analysis.

Long-term Vision

Foundation for Scientific Machine Learning

JEPA provides new insights for scientific machine learning and may become a foundational method in the future, advancing scientific research.

Cross-domain Applications

JEPA's latent space prediction method may be applied in other fields, such as biomedicine and meteorology, advancing these fields.

Abstract

Machine learning approaches to spatiotemporal physical systems have primarily focused on next-frame prediction, with the goal of learning an accurate emulator for the system's evolution in time. However, these emulators are computationally expensive to train and are subject to performance pitfalls, such as compounding errors during autoregressive rollout. In this work, we take a different perspective and look at scientific tasks further downstream of predicting the next frame, such as estimation of a system's governing physical parameters. Accuracy on these tasks offers a uniquely quantifiable glimpse into the physical relevance of the representations of these models. We evaluate the effectiveness of general-purpose self-supervised methods in learning physics-grounded representations that are useful for downstream scientific tasks. Surprisingly, we find that not all methods designed for physical modeling outperform generic self-supervised learning methods on these tasks, and methods that learn in the latent space (e.g., joint embedding predictive architectures, or JEPAs) outperform those optimizing pixel-level prediction objectives. Code is available at https://github.com/helenqu/physical-representation-learning.

cs.LG cs.CV

References (20)

DISCO: learning to DISCover an evolution Operator for multi-physics-agnostic prediction

Rudy Morel, Jiequn Han, Edouard Oyallon

2025 11 citations ⭐ Influential View Analysis →

Multiple Physics Pretraining for Spatiotemporal Surrogate Models

Michael McCabe, Bruno Régaldo-Saint Blancard, L. Parker et al.

2024 43 citations ⭐ Influential

Weak Adversarial Networks for High-dimensional Partial Differential Equations

Yaohua Zang, Gang Bao, X. Ye et al.

2019 489 citations View Analysis →

Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture

Mahmoud Assran, Quentin Duval, Ishan Misra et al.

2023 686 citations View Analysis →

Unsupervised Deep Learning Algorithm for PDE-based Forward and Inverse Problems

Leah Bar, N. Sochen

2019 77 citations View Analysis →

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee et al.

2019 111519 citations View Analysis →

DGM: A deep learning algorithm for solving partial differential equations

Justin A. Sirignano, K. Spiliopoulos

2017 2375 citations View Analysis →

ImageNet: A large-scale hierarchical image database

Jia Deng, Wei Dong, R. Socher et al.

2009 71625 citations

Multiple Physics Pretraining for Physical Surrogate Models

Michael McCabe, Bruno Régaldo-Saint Blancard, L. Parker et al.

2023 91 citations View Analysis →

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

Mahmoud Assran, Adrien Bardes, David Fan et al.

2025 235 citations View Analysis →

Emerging Properties in Self-Supervised Vision Transformers

Mathilde Caron, Hugo Touvron, Ishan Misra et al.

2021 8474 citations View Analysis →

Improved Baselines with Momentum Contrastive Learning

Xinlei Chen, Haoqi Fan, Ross B. Girshick et al.

2020 3861 citations View Analysis →

Learning fast, accurate, and stable closures of a kinetic theory of an active fluid

S. Maddu, Scott Weady, Michael J. Shelley

2023 11 citations View Analysis →

Revisiting Feature Prediction for Learning Visual Representations from Video

Adrien Bardes, Q. Garrido, Jean Ponce et al.

2024 219 citations View Analysis →

Towards a Foundation Model for Partial Differential Equations: Multi-Operator Learning and Extrapolation

Jingmin Sun, Yuxuan Liu, Zecheng Zhang et al.

2024 42 citations View Analysis →

VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning

Adrien Bardes, J. Ponce, Yann LeCun

2021 1163 citations View Analysis →

A Simple Framework for Contrastive Learning of Visual Representations

Ting Chen, Simon Kornblith, Mohammad Norouzi et al.

2020 23310 citations View Analysis →

A Path Towards Autonomous Machine Intelligence Version 0.9.2, 2022-06-27

Yann LeCun, Courant

2022 680 citations

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, Noam Shazeer, Adam Roberts et al.

2019 24781 citations View Analysis →

Masked Autoencoders Are Scalable Vision Learners

Kaiming He, Xinlei Chen, Saining Xie et al.

2021 10732 citations View Analysis →

Representation Learning for Spatiotemporal Physical Systems

Key Findings

Methodology

Key Results

Significance

Technical Contribution

Novelty

Limitations

Future Work

AI Executive Summary

Deep Analysis

Background

Core Problem

Innovation

Methodology

Experiments

Results

Applications

Limitations & Outlook

Plain Language Accessible to non-experts

ELI14 Explained like you're 14

Glossary

Joint Embedding Predictive Architecture (JEPA)

Self-supervised learning

Latent space

Mean squared error (MSE)

Active matter

Shear flow

Rayleigh-Bénard convection

VICReg loss function

VideoMAE

DISCO

Open Questions Unanswered questions from this research

Applications

Immediate Applications

Physical Parameter Estimation

Complex System Analysis

Qualitative Prediction

Long-term Vision

Foundation for Scientific Machine Learning

Cross-domain Applications

Abstract

References (20)

Related Papers

PhysMoDPO: Physically-Plausible Humanoid Motion with Preference Optimization

Learnability and Privacy Vulnerability are Entangled in a Few Critical Weights

MXNorm: Reusing MXFP block scales for efficient tensor normalisation

ZO-SAM: Zero-Order Sharpness-Aware Minimization for Efficient Sparse Training

BoSS: A Best-of-Strategies Selector as an Oracle for Deep Active Learning

Breaking the Tuning Barrier: Zero-Hyperparameters Yield Multi-Corner Analysis Via Learned Priors