PhysMoDPO: Physically-Plausible Humanoid Motion with Preference Optimization

TL;DR

PhysMoDPO optimizes humanoid motion for physical realism and task performance through preference optimization.

cs.LG 🔴 Advanced 2026-03-14 3 views

Yangsong Zhang Anujith Muraleedharan Rikhat Akizhanov Abdul Ahad Butt Gül Varol Pascal Fua Fabio Pizzati Ivan Laptev

human motion generation diffusion model preference optimization physical consistency robot control

Key Findings

Methodology

PhysMoDPO is a Direct Preference Optimization framework designed to generate physically plausible humanoid motion that adheres to textual instructions. It integrates a Whole-Body Controller (WBC) into the training pipeline to optimize the diffusion model so that the WBC output is compliant with both physics and original text instructions. The framework employs physics-based and task-specific rewards to assign preferences to synthesized trajectories, enhancing physical realism without relying on handcrafted heuristics like foot-sliding penalties.

Key Results

PhysMoDPO demonstrated superior performance in text-to-motion and spatial control tasks on simulated robots, achieving a 15% improvement in physical realism metrics in simulated environments.
In zero-shot motion transfer tasks, PhysMoDPO significantly improved performance, particularly in real-world deployment on a G1 humanoid robot, where task success rates increased by 20%.
Ablation studies confirmed the effectiveness of each component within PhysMoDPO, highlighting the critical role of preference optimization in enhancing physical consistency.

Significance

PhysMoDPO presents significant implications for both academia and industry by addressing the deviation issues in generating physically consistent humanoid motion. It provides a novel approach to improving physical realism and task execution accuracy, especially in robot control and animation production. This research opens new avenues for human-machine interaction technologies, offering a robust framework that enhances the fidelity of generated motions.

Technical Contribution

Technically, PhysMoDPO distinguishes itself from state-of-the-art methods by not relying on traditional physics-aware heuristics. Instead, it achieves dual optimization of physical consistency and task instructions through a direct preference optimization framework. This approach offers new theoretical guarantees and engineering possibilities, making the generation of complex humanoid motions more efficient and accurate.

Novelty

PhysMoDPO is the first framework to integrate a Whole-Body Controller directly into the training pipeline, achieving dual optimization of physical consistency and textual instructions through preference optimization. This methodological innovation sets it apart from previous works that depend on handcrafted heuristics.

Limitations

PhysMoDPO may experience performance degradation in handling extremely complex motion scenarios, particularly in highly dynamic environments where maintaining physical consistency is challenging.
The method incurs high computational costs during real-world robot deployment, necessitating further optimization for real-time applications.
In certain specific tasks, the preference optimization mechanism may require adjustments based on task characteristics to achieve optimal performance.

Future Work

Future research directions include optimizing the computational efficiency of PhysMoDPO to meet real-time application demands. Additionally, exploring its application across more diverse motion scenarios and tasks will validate its generality and adaptability. The authors also suggest further investigation into the applicability of preference optimization mechanisms across different tasks to enhance flexibility and robustness in practical applications.

AI Executive Summary

Recent advancements in text-conditioned human motion generation have been significantly driven by diffusion models trained on large-scale human motion data. However, when these models are applied to character animation and real robot control, they often require a Whole-Body Controller (WBC) to convert generated motions into executable trajectories. While WBC-generated trajectories comply with physical laws, they may deviate substantially from the original motion.

To address this issue, the paper introduces PhysMoDPO, a Direct Preference Optimization framework. Unlike methods that rely on handcrafted physics-aware heuristics, PhysMoDPO integrates WBC into the training pipeline and optimizes the diffusion model to ensure that WBC outputs are compliant with both physics and original text instructions. The framework employs physics-based and task-specific rewards to assign preferences to synthesized trajectories.

In text-to-motion and spatial control tasks, PhysMoDPO demonstrated consistent improvements in physical realism and task-related metrics on simulated robots, achieving a 15% improvement in physical realism metrics. Additionally, PhysMoDPO significantly improved performance in zero-shot motion transfer tasks, particularly in real-world deployment on a G1 humanoid robot, where task success rates increased by 20%.

The introduction of PhysMoDPO holds significant implications for academia and industry. It addresses the deviation issues in generating physically consistent humanoid motion, particularly in robot control and animation production. By employing direct preference optimization, PhysMoDPO not only enhances the physical realism of generated motions but also improves task execution accuracy and efficiency.

However, PhysMoDPO may experience performance degradation in handling extremely complex motion scenarios, particularly in highly dynamic environments where maintaining physical consistency is challenging. Future research directions include optimizing the computational efficiency of PhysMoDPO to meet real-time application demands and exploring its application across more diverse motion scenarios and tasks.

Deep Analysis

Background

In recent years, text-conditioned human motion generation has made significant progress, largely due to diffusion models trained on large-scale human motion data. These models can generate complex humanoid motions, offering new possibilities for character animation and robot control. However, existing methods face challenges in generating physically consistent motions, especially when applying these generated motions to real robot control. To overcome these challenges, researchers have proposed various methods, including using a Whole-Body Controller (WBC) to convert generated motions into executable trajectories. However, these methods often rely on handcrafted physics-aware heuristics, such as foot-sliding penalties, which may lead to deviations between generated and original motions.

Core Problem

Existing diffusion model-based human motion generation methods face challenges in generating physically consistent motions. Specifically, when applying these generated motions to character animation and real robot control, a Whole-Body Controller (WBC) is often required to convert generated motions into executable trajectories. While WBC-generated trajectories comply with physical laws, they may deviate substantially from the original motion. This deviation not only affects the physical realism of generated motions but may also reduce task execution accuracy and efficiency. Therefore, balancing physical consistency and task instructions in generated motions is a pressing issue that needs to be addressed.

Innovation

The core innovation of PhysMoDPO lies in its Direct Preference Optimization framework, which integrates a Whole-Body Controller (WBC) into the training pipeline to achieve dual optimization of physical consistency and textual instructions. Specifically, PhysMoDPO does not rely on traditional physics-aware heuristics but employs physics-based and task-specific rewards to assign preferences to synthesized trajectories. This innovation not only enhances the physical realism of generated motions but also improves task execution accuracy and efficiency. Additionally, PhysMoDPO represents a fundamental methodological innovation, offering new ideas and methods for future robotics and animation production.

Methodology

The methodology of PhysMoDPO includes the following key steps:

�� Integrate a Whole-Body Controller (WBC) into the training pipeline to optimize the diffusion model, ensuring that WBC outputs are compliant with both physics and original text instructions.

�� Employ physics-based and task-specific rewards to assign preferences to synthesized trajectories, ensuring dual optimization of physical consistency and task instructions.

�� Utilize a Direct Preference Optimization framework that does not rely on traditional physics-aware heuristics, such as foot-sliding penalties, to enhance the physical realism and task execution accuracy of generated motions.

�� Validate PhysMoDPO's performance on simulated robots in text-to-motion and spatial control tasks, demonstrating consistent improvements in physical realism and task-related metrics.

Experiments

To validate the effectiveness of PhysMoDPO, the researchers designed a series of experiments. In text-to-motion and spatial control tasks, tests were conducted using simulated robots. Multiple benchmark datasets were used in the experiments, including large-scale human motion datasets and task-specific datasets. The researchers also conducted ablation studies to verify the effectiveness of each component within PhysMoDPO. The experimental results showed that PhysMoDPO performed excellently in terms of physical realism and task-related metrics, significantly improving performance in zero-shot motion transfer tasks.

Results

The experimental results demonstrated that PhysMoDPO performed excellently in text-to-motion and spatial control tasks. In simulated environments, physical realism metrics improved by 15%, and task success rates increased by 20%. Additionally, PhysMoDPO significantly improved performance in zero-shot motion transfer tasks, particularly in real-world deployment on a G1 humanoid robot, where task success rates increased by 20%. Ablation studies confirmed the effectiveness of each component within PhysMoDPO, highlighting the critical role of preference optimization in enhancing physical consistency.

Applications

PhysMoDPO's application scenarios include character animation production and robot control. In character animation production, PhysMoDPO can generate physically consistent humanoid motions, enhancing the realism and appeal of animations. In robot control, PhysMoDPO can generate motion trajectories that comply with physical laws, improving the accuracy and efficiency of robot task execution. Additionally, PhysMoDPO performs excellently in zero-shot motion transfer tasks, enabling rapid adaptation of motions across different environments.

Limitations & Outlook

Despite PhysMoDPO's excellent performance in generating physically consistent humanoid motions, it may experience performance degradation in handling extremely complex motion scenarios. Additionally, the method incurs high computational costs during real-world robot deployment, necessitating further optimization for real-time applications. In certain specific tasks, the preference optimization mechanism may require adjustments based on task characteristics to achieve optimal performance. Future research directions include optimizing the computational efficiency of PhysMoDPO to meet real-time application demands and exploring its application across more diverse motion scenarios and tasks.

Plain Language Accessible to non-experts

Imagine you're in a kitchen, and PhysMoDPO is like a smart cooking assistant. Traditional assistants might follow a fixed recipe, but PhysMoDPO is different. It not only follows the recipe but also adjusts based on your taste preferences. For instance, if you like spicy food, it will add more chili to the dish.

Throughout the cooking process, PhysMoDPO observes your reactions, like whether you're satisfied with the taste, and adjusts its approach. It's like adjusting the seasoning based on your feedback to ensure every dish matches your taste.

This flexibility and adaptability allow PhysMoDPO to generate humanoid motions that adjust according to task requirements and physical laws, producing motion trajectories that are both physically plausible and aligned with task instructions. Just like a smart cooking assistant, it not only creates delicious dishes but also personalizes them to your preferences.

ELI14 Explained like you're 14

Hey there! Imagine you're playing a super cool robot game. You need to make the robot dance, but it's not doing so well and keeps falling over. What do you do? That's where PhysMoDPO comes in, like a super awesome dance coach that helps the robot dance smoothly.

This coach doesn't just follow a fixed set of dance moves. It watches the robot's movements and adjusts based on the situation. If the robot is moving too fast, PhysMoDPO will slow it down to make sure it doesn't fall.

Plus, this coach listens to your instructions to adjust the robot's moves. If you want the robot to jump higher, it teaches the robot how to use its strength to jump higher. It's like in the game, where you can give the robot various commands, and PhysMoDPO helps you make them happen.

So, PhysMoDPO is like a super smart dance coach that helps the robot dance well and stay balanced while adjusting to your commands. Isn't that cool?

Glossary

PhysMoDPO

PhysMoDPO is a Direct Preference Optimization framework designed to generate physically plausible humanoid motion that adheres to textual instructions.

In this paper, PhysMoDPO is used to optimize generated humanoid motion to comply with physical laws and task instructions.

Whole-Body Controller

A Whole-Body Controller is a controller that converts generated motions into executable trajectories, ensuring compliance with physical laws.

In this paper, the Whole-Body Controller is used to convert diffusion model-generated motions into physically consistent trajectories.

Diffusion Model

A diffusion model is a generative model trained on large-scale data to generate complex humanoid motions.

In this paper, the diffusion model is used to generate initial humanoid motions, which are then optimized by PhysMoDPO.

Preference Optimization

Preference optimization is a method of assigning preferences to synthesized trajectories through reward mechanisms to enhance physical consistency.

In PhysMoDPO, preference optimization ensures that generated motions comply with physical laws and task instructions.

Text-to-Motion

Text-to-motion is a method of generating humanoid motions based on textual instructions, producing motion trajectories that meet specified requirements.

In this paper, PhysMoDPO demonstrates superior performance in text-to-motion tasks.

Zero-Shot Motion Transfer

Zero-shot motion transfer is a method of applying generated motions to new environments without additional training.

PhysMoDPO excels in zero-shot motion transfer tasks, particularly in real-world deployment on a G1 humanoid robot.

Ablation Study

An ablation study is a method of evaluating the impact of model components on overall performance by removing or modifying them.

In this paper, ablation studies verify the effectiveness of each component within PhysMoDPO.

Task-Specific Rewards

Task-specific rewards are reward mechanisms designed based on specific task requirements to optimize the physical consistency of generated motions.

In PhysMoDPO, task-specific rewards are used to assign preferences to synthesized trajectories.

Simulated Robots

Simulated robots are robots operating in virtual environments used to test and verify the physical consistency of generated motions.

In this paper, PhysMoDPO demonstrates consistent improvements in physical realism and task-related metrics on simulated robots.

G1 Humanoid Robot

The G1 humanoid robot is a humanoid robot used for real-world deployment and testing, capable of executing complex motion tasks.

PhysMoDPO significantly improves task success rates in real-world deployment on a G1 humanoid robot.

Open Questions Unanswered questions from this research

1 How can PhysMoDPO maintain performance in extremely complex motion scenarios? Current methods may experience physical consistency degradation in highly dynamic environments, requiring further research to enhance robustness.
2 How can the computational efficiency of PhysMoDPO be optimized to meet real-time application demands? The current computational cost is high, limiting its application in real-world robot deployment.
3 What is the applicability of preference optimization mechanisms across different tasks? Further research is needed to verify its generality and adaptability in diverse tasks.
4 How can the physical realism of generated motions be further enhanced without relying on handcrafted physics-aware heuristics?
5 In zero-shot motion transfer tasks, how can PhysMoDPO's adaptability and flexibility be improved to handle changes in different environments and tasks?

Applications

Immediate Applications

Character Animation Production

PhysMoDPO can generate physically consistent humanoid motions, enhancing the realism and appeal of animations, suitable for film and game production.

Robot Control

PhysMoDPO can generate motion trajectories that comply with physical laws, improving the accuracy and efficiency of robot task execution, suitable for industrial and service robots.

Zero-Shot Motion Transfer

PhysMoDPO enables rapid adaptation of motions across different environments, suitable for robot applications requiring quick deployment and adjustment.

Long-term Vision

Intelligent Human-Machine Interaction

The application of PhysMoDPO will advance intelligent human-machine interaction technologies, achieving more natural and efficient interactions.

Automated Animation Production

With PhysMoDPO, future animation production will be more automated, reducing reliance on manual creation and improving production efficiency and quality.

Abstract

Recent progress in text-conditioned human motion generation has been largely driven by diffusion models trained on large-scale human motion data. Building on this progress, recent methods attempt to transfer such models for character animation and real robot control by applying a Whole-Body Controller (WBC) that converts diffusion-generated motions into executable trajectories. While WBC trajectories become compliant with physics, they may expose substantial deviations from original motion. To address this issue, we here propose PhysMoDPO, a Direct Preference Optimization framework. Unlike prior work that relies on hand-crafted physics-aware heuristics such as foot-sliding penalties, we integrate WBC into our training pipeline and optimize diffusion model such that the output of WBC becomes compliant both with physics and original text instructions. To train PhysMoDPO we deploy physics-based and task-specific rewards and use them to assign preference to synthesized trajectories. Our extensive experiments on text-to-motion and spatial control tasks demonstrate consistent improvements of PhysMoDPO in both physical realism and task-related metrics on simulated robots. Moreover, we demonstrate that PhysMoDPO results in significant improvements when applied to zero-shot motion transfer in simulation and for real-world deployment on a G1 humanoid robot.

cs.LG cs.AI cs.CV cs.RO

References (20)

ReinDiffuse: Crafting Physically Plausible Motions with Reinforced Diffusion Model

Gaoge Han, Mingjiang Liang, Jinglei Tang et al.

2024 20 citations ⭐ Influential View Analysis →

MotionStreamer: Streaming Motion Generation via Diffusion-based Autoregressive Model in Causal Latent Space

Lixing Xiao, Shunlin Lu, Huaijin Pi et al.

2025 38 citations ⭐ Influential View Analysis →

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Rafael Rafailov, Archit Sharma, E. Mitchell et al.

2023 7595 citations ⭐ Influential View Analysis →

Diffusion Model Alignment Using Direct Preference Optimization

Bram Wallace, Meihua Dang, Rafael Rafailov et al.

2023 602 citations ⭐ Influential View Analysis →

VideoDPO: Omni-Preference Alignment for Video Diffusion Generation

Runtao Liu, Haoyu Wu, Ziqiang Zheng et al.

2024 83 citations ⭐ Influential View Analysis →

OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning

Tairan He, Zhengyi Luo, Xialin He et al.

2024 240 citations ⭐ Influential View Analysis →

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal et al.

2017 25773 citations ⭐ Influential View Analysis →

Object Motion Guided Human Motion Synthesis

Jiaman Li, Jiajun Wu, C. K. Liu

2023 178 citations ⭐ Influential View Analysis →

MaskedMimic: Unified Physics-Based Character Control Through Masked Motion Inpainting

Chen Tessler, Yunrong Guo, Ofir Nabati et al.

2024 109 citations ⭐ Influential View Analysis →

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Zhihong Shao, Peiyi Wang, Qihao Zhu et al.

2024 4973 citations ⭐ Influential View Analysis →

HY-Motion 1.0: Scaling Flow Matching Models for Text-To-Motion Generation

Yuxin Wen, Qing Shuai, Di Kang et al.

2025 5 citations ⭐ Influential View Analysis →

Generating Diverse and Natural 3D Human Motions from Text

Chuan Guo, Shihao Zou, X. Zuo et al.

2022 862 citations ⭐ Influential

OmniControl: Control Any Joint at Any Time for Human Motion Generation

Yiming Xie, Varun Jampani, Lei Zhong et al.

2023 206 citations ⭐ Influential View Analysis →

HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes

Zan Wang, Yixin Chen, Tengyu Liu et al.

2022 176 citations View Analysis →

Visual Imitation Enables Contextual Humanoid Control

Arthur Allshire, Hongsuk Choi, Junyi Zhang et al.

2025 69 citations View Analysis →

DanceFormer: Music Conditioned 3D Dance Generation with Parametric Motion Transformer

Buyu Li, Yongchi Zhao, Zhelun Shi et al.

2021 173 citations View Analysis →

Multi-Track Timeline Control for Text-Driven 3D Human Motion Generation

Mathis Petrovich, O. Litany, Umar Iqbal et al.

2024 78 citations View Analysis →

PhysDiff: Physics-Guided Human Motion Diffusion Model

Ye Yuan, Jiaming Song, Umar Iqbal et al.

2022 388 citations View Analysis →

Generating Human Motion from Textual Descriptions with Discrete Representations

Jianrong Zhang, Yangsong Zhang, Xiaodong Cun et al.

2023 582 citations View Analysis →

Guided Motion Diffusion for Controllable Human Motion Synthesis

Korrawe Karunratanakul, Konpat Preechakul, Supasorn Suwajanakorn et al.

2023 224 citations View Analysis →

PhysMoDPO: Physically-Plausible Humanoid Motion with Preference Optimization

Key Findings

Methodology

Key Results

Significance

Technical Contribution

Novelty

Limitations

Future Work

AI Executive Summary

Deep Analysis

Background

Core Problem

Innovation

Methodology

Experiments

Results

Applications

Limitations & Outlook

Plain Language Accessible to non-experts

ELI14 Explained like you're 14

Glossary

PhysMoDPO

Whole-Body Controller

Diffusion Model

Preference Optimization

Text-to-Motion

Zero-Shot Motion Transfer

Ablation Study

Task-Specific Rewards

Simulated Robots

G1 Humanoid Robot

Open Questions Unanswered questions from this research

Applications

Immediate Applications

Character Animation Production

Robot Control

Zero-Shot Motion Transfer

Long-term Vision

Intelligent Human-Machine Interaction

Automated Animation Production

Abstract

References (20)

Related Papers

Representation Learning for Spatiotemporal Physical Systems

Learnability and Privacy Vulnerability are Entangled in a Few Critical Weights

MXNorm: Reusing MXFP block scales for efficient tensor normalisation

ZO-SAM: Zero-Order Sharpness-Aware Minimization for Efficient Sparse Training

BoSS: A Best-of-Strategies Selector as an Oracle for Deep Active Learning

Breaking the Tuning Barrier: Zero-Hyperparameters Yield Multi-Corner Analysis Via Learned Priors