Real-Time Whole-Body Teleoperation of a Humanoid Robot Using IMU-Based Motion Capture with Sim2Sim and Sim2Real Validation
Real-time whole-body teleoperation using Virdyn IMU motion capture, validated on Unitree G1 robot with Sim2Sim and Sim2Real.
Key Findings
Methodology
This paper presents a lightweight real-time whole-body teleoperation pipeline using a Virdyn IMU motion capture suit to map human motion directly onto a Unitree G1 robot. The system operates without learning or offline preprocessing, employing a unified kinematic retargeting algorithm that functions identically in both simulation and physical hardware. Validated through Sim2Sim in the MuJoCo simulator and Sim2Real on the physical platform, the system demonstrates stability and low latency.
Key Results
- Result 1: In MuJoCo simulation, the system stably reproduces a wide range of motions, including walking, standing, sitting, turning, bowing, and coordinated gestures, with no perceptible latency.
- Result 2: On the physical Unitree G1 robot, the system achieves real-time motion reproduction, with no noticeable delay between the operator's motion and the robot's execution, confirming Sim2Real effectiveness.
- Result 3: Experiments show the system can transfer directly from simulation to real hardware without additional domain adaptation or parameter tuning.
Significance
This research establishes a practical, scalable framework for whole-body humanoid teleoperation using commodity wearable motion capture hardware. Unlike traditional learning-based methods, this system requires no training data or complex domain randomization, enabling low-latency real-time operation in real-world scenarios. Its significance extends to both academia and industry, offering new possibilities in applications requiring rapid response and stable control.
Technical Contribution
The technical contributions include a learning-free motion retargeting algorithm that seamlessly operates on both simulation and real hardware. Compared to existing reinforcement-learning-based whole-body controllers, this system requires no training data or complex reward engineering, supporting true real-time operation. Additionally, the system's behavior is fully interpretable and deterministic, advantageous in safety-critical deployment scenarios.
Novelty
This study is the first to achieve real-time whole-body teleoperation using an IMU suit, capable of Sim2Real transfer without any learning or offline processing. The innovation lies in its physics-agnostic kinematic retargeting design, allowing direct application to different humanoid platforms.
Limitations
- Limitation 1: The system relies on the Unitree G1's onboard servo controllers for low-level stability. In highly dynamic motions, such as rapid direction changes or large-amplitude arm swings, the robot's balance could be improved.
- Limitation 2: The EMA filter introduces a small, motion-speed-dependent phase lag, potentially affecting fast motion responsiveness.
- Limitation 3: The current system does not integrate hand manipulation, which could be expanded using finger-level IMU data or vision-based hand tracking.
Future Work
Future directions include: automatic retargeting parameter adaptation for other humanoid platforms; integration with imitation learning pipelines to bootstrap neural whole-body controllers from teleoperated demonstrations; and extension to dexterous hand manipulation using finger-level IMU data or vision-based hand tracking.
AI Executive Summary
Whole-body teleoperation technology plays a crucial role in robotics, especially when robots are required to perform complex tasks in unstructured environments. However, existing methods often rely on pre-scripted motion playback, offline trajectory optimization, or learning-based controllers, which face challenges in practical applications due to complex tuning, large data requirements, and lengthy training times.
This paper introduces an innovative real-time whole-body teleoperation system that uses a Virdyn IMU motion capture suit to map human movements directly onto a Unitree G1 robot. The system operates without any offline buffering or learning components, employing a unified kinematic retargeting algorithm that functions identically in both simulation and physical hardware.
The core technical principles include: computing equivalent angles via geometric projection to preserve intended motion intent while remaining within the robot's physical range; applying a lightweight exponential moving average filter to smooth high-frequency noise in IMU estimates; and synchronizing upper-body, lower-body, and torso motions within a single retargeting step to maintain the robot's center-of-mass trajectory consistent with the operator's posture.
Experimental results demonstrate that the system can stably reproduce a wide range of motions, including walking, standing, sitting, turning, bowing, and coordinated gestures. Sim2Sim validation in the MuJoCo simulator and Sim2Real validation on the physical Unitree G1 robot both show the system's stability and low latency.
This research establishes a practical, scalable framework for whole-body humanoid teleoperation using commodity wearable motion capture hardware. Unlike traditional learning-based methods, this system requires no training data or complex domain randomization, enabling low-latency real-time operation in real-world scenarios.
Despite its impressive performance, the system's balance in highly dynamic motions could be improved. Additionally, the EMA filter introduces a small phase lag, which could be further reduced using an adaptive Kalman filter. Future research directions include automatic retargeting parameter adaptation for other humanoid platforms and integration with imitation learning pipelines.
Deep Analysis
Background
In recent years, humanoid robots have advanced rapidly, particularly in performing complex operational tasks. However, achieving stable, natural whole-body motion remains a challenge. Traditional methods, such as pre-scripted motion playback and offline trajectory optimization, perform well in controlled environments but often require extensive tuning and data in practical applications. Learning-based controllers, such as reinforcement learning, can achieve robust motion control in some cases but involve complex training processes and high demands for data and computational resources.
Motion capture technology offers a direct and intuitive way to teleoperate humanoid robots by retargeting human natural motion onto robots in real-time. However, due to kinematic discrepancies between humans and robots, inherent high-frequency noise in IMU sensors, and challenges in transferring from simulation to physical hardware, achieving this goal is not straightforward.
This paper proposes an innovative solution by developing a learning-free real-time whole-body teleoperation system using a Virdyn IMU motion capture suit and a Unitree G1 robot. The system operates seamlessly on both simulation and real hardware, demonstrating its potential in complex operational tasks.
Core Problem
Achieving stable, low-latency whole-body teleoperation is an open research challenge in the field of humanoid robots. The main difficulties include: kinematic discrepancies between human and robot structures leading to motion retargeting issues; high-frequency noise inherent in IMU sensors affecting the accuracy of pose estimation; the need for joint-limit-safe motion at robot control rates; and the risk of instability when transitioning from simulation to physical hardware.
These issues pose challenges for existing teleoperation systems in terms of stability and responsiveness, particularly in scenarios requiring rapid response and complex actions. Therefore, developing a system capable of stable, low-latency whole-body teleoperation without relying on learning or offline processing is of significant importance.
Innovation
The core innovations of this paper include the development of a learning-free real-time whole-body teleoperation system that operates seamlessly on both simulation and real hardware.
- �� Motion Retargeting Algorithm: Computes equivalent angles via geometric projection to preserve intended motion intent while remaining within the robot's physical range. This approach avoids complex learning processes, enabling rapid deployment.
- �� Real-Time Smoothing Filter: Applies a lightweight exponential moving average filter to smooth high-frequency noise in IMU estimates, ensuring stability and responsiveness of motion.
- �� Synchronized Motion Control: Synchronizes upper-body, lower-body, and torso motions within a single retargeting step to maintain the robot's center-of-mass trajectory consistent with the operator's posture. This synchronization ensures overall stability of the robot.
These innovations allow the system to transfer directly from simulation to real hardware without additional domain adaptation or parameter tuning.
Methodology
The paper presents a lightweight real-time whole-body teleoperation pipeline with the following methodology:
- �� Motion Capture: Uses a Virdyn IMU motion capture suit to record human full-body motion data in real-time. The suit is equipped with inertial sensors distributed across major body segments, estimating segment orientations and joint angles without relying on external cameras or optical markers.
- �� Motion Retargeting: Maps human skeleton joints to their closest functional counterparts on the Unitree G1 kinematic tree. For structural differences that prevent direct correspondence (e.g., human hip complex vs. robot's three-DoF hip joint), computes equivalent angles via geometric projection.
- �� Joint Limit Enforcement: Clips every mapped joint command to the robot's hardware joint limits before transmission. Soft limits are applied to prevent actuator saturation, especially for high-velocity motions.
- �� Real-Time Smoothing: Applies a lightweight exponential moving average filter per joint to smooth high-frequency noise in IMU estimates, ensuring stability and responsiveness.
- �� Synchronization: Synchronizes upper-body, lower-body, and torso motions within a single retargeting step, ensuring the robot's center-of-mass trajectory remains consistent with the operator's posture. The entire pipeline operates in a tight loop with no buffering or batch processing.
Experiments
The experimental design includes Sim2Sim validation in the MuJoCo physics simulator and Sim2Real validation on the physical Unitree G1 robot.
- �� Datasets: Real-time full-body motion data recorded using the Virdyn IMU motion capture suit.
- �� Baselines: Compared with traditional learning-based whole-body controllers to evaluate the system's performance in terms of stability and responsiveness.
- �� Metrics: Key evaluation metrics include motion stability, responsiveness, and retargeting accuracy.
- �� Hyperparameters: The time constant of the lightweight exponential moving average filter is tuned to balance noise attenuation and motion responsiveness.
- �� Ablation Studies: The system's performance is tested across different motion categories (e.g., walking, sitting, turning) to validate its effectiveness in various scenarios.
Results
Experimental results demonstrate that the system can stably reproduce a wide range of motions, including walking, standing, sitting, turning, bowing, and coordinated gestures.
- �� Sim2Sim validation in the MuJoCo simulator shows that the system produces physically plausible configurations with no joint limit violations, self-collisions, or abrupt discontinuities in joint velocity.
- �� Sim2Real validation on the physical Unitree G1 robot achieves real-time motion reproduction, with no noticeable delay between the operator's motion and the robot's execution, confirming Sim2Real effectiveness.
- �� Experiments show the system can transfer directly from simulation to real hardware without additional domain adaptation or parameter tuning.
Applications
The system's application scenarios include:
- �� Remote Operation: Enables stable, low-latency remote operation in scenarios requiring rapid response and complex actions, such as disaster recovery and hazardous environment operations.
- �� Human-Robot Interaction: Provides a more intuitive operation experience in scenarios requiring natural, fluid human-robot interaction, such as entertainment and education.
- �� Industrial Automation: Improves production efficiency and operational safety in industrial scenarios requiring precise control and rapid response, such as assembly lines and quality inspection.
Limitations & Outlook
Despite its impressive performance, the system has some limitations.
- �� The system relies on the Unitree G1's onboard servo controllers for low-level stability. In highly dynamic motions, such as rapid direction changes or large-amplitude arm swings, the robot's balance could be improved.
- �� The EMA filter introduces a small, motion-speed-dependent phase lag, potentially affecting fast motion responsiveness. Future work could explore using an adaptive Kalman filter to further reduce this effect.
- �� The current system does not integrate hand manipulation, which could be expanded using finger-level IMU data or vision-based hand tracking.
Plain Language Accessible to non-experts
Imagine you're controlling a remote-controlled robot that can mimic your every move. You're wearing a special suit equipped with sensors that capture your every action. These sensors act like the robot's eyes and ears, transmitting your movements to the robot in real-time.
When you raise your arm, the robot raises its arm too; when you turn around, the robot follows suit. This process is like playing a life-sized video game where you're the character, and the robot is your avatar.
To ensure the robot accurately mimics your movements, the system makes some adjustments and optimizations. For example, if your movements are too fast, the system slightly slows down the robot's actions to ensure it doesn't lose balance.
What's special about this system is that it doesn't require any complex learning processes or pre-programming. As soon as you wear the suit, the robot can start mimicking your movements immediately. This technology can be applied in many fields, such as remote operation, virtual reality, and human-robot interaction.
ELI14 Explained like you're 14
Hey there! Imagine if you could control a robot and make it copy everything you do. Sounds cool, right? It's like playing a super realistic video game where you're the main character!
Scientists have invented a magical suit with lots of tiny sensors that can capture your movements. Then, these movements are sent to a robot, and the robot copies you. So, if you wave your hand, the robot waves its hand; if you jump, the robot jumps too!
This system is super cool because it doesn't need to teach the robot how to move beforehand. As soon as you put on the suit, the robot can start copying you. Plus, it can work in different places, like at home, at school, or even in outer space!
In the future, this technology could be used in many areas, like helping doctors perform surgeries or letting us experience different worlds in virtual reality. Exciting, right?
Glossary
IMU (Inertial Measurement Unit)
An IMU is a sensor device that measures an object's acceleration and rotational rate. It's commonly used in motion capture and navigation systems.
In this paper, the IMU is used to capture human full-body motion data for robot teleoperation.
Motion Retargeting
Motion retargeting is the process of converting one motion pattern into another, typically used to map human motion onto a robot.
The paper proposes a learning-free motion retargeting algorithm to map human movements onto the Unitree G1 robot.
Sim2Sim (Simulation to Simulation)
Sim2Sim is a validation method that tests a system's performance in a simulation environment to ensure its feasibility in real-world applications.
The paper conducts Sim2Sim validation in the MuJoCo simulator to evaluate the effectiveness of the motion retargeting algorithm.
Sim2Real (Simulation to Real)
Sim2Real refers to the process of transferring technology from a simulation environment to a real-world environment, typically used to validate a system's performance on real hardware.
The paper performs Sim2Real validation on the physical Unitree G1 robot, demonstrating the system's stability and low latency.
Virdyn IMU Motion Capture Suit
The Virdyn IMU motion capture suit is a device equipped with inertial sensors used to capture human full-body motion data in real-time.
The paper uses this suit to record human motion data for robot teleoperation.
Unitree G1 Robot
The Unitree G1 is a humanoid robot with multiple degrees of freedom, commonly used for research and development in human-robot interaction technologies.
The paper maps human motion data onto the Unitree G1 robot to achieve whole-body teleoperation.
Exponential Moving Average Filter (EMA)
An EMA is a filter used to smooth data, reducing high-frequency noise while maintaining data responsiveness.
The paper applies an EMA filter to smooth high-frequency noise in IMU estimates, ensuring motion stability.
Geometric Projection
Geometric projection is a mathematical method used to map points from one space to another, often used to compute equivalent angles.
The paper uses geometric projection to compute equivalent angles, ensuring the accuracy of motion retargeting.
Joint Limit
Joint limits refer to the maximum and minimum angle ranges a robot's joints can achieve during motion to prevent mechanical damage.
The paper clips all mapped joint commands to the robot's hardware joint limits before transmission.
Synchronized Control
Synchronized control coordinates motion among multiple components to ensure overall system stability and consistency.
The paper synchronizes upper-body, lower-body, and torso motions within a single retargeting step to maintain the robot's center-of-mass trajectory.
Open Questions Unanswered questions from this research
- 1 How can the robot's balance be improved in highly dynamic motions? The current system relies on the Unitree G1's onboard servo controllers for low-level stability, but in highly dynamic motions, such as rapid direction changes or large-amplitude arm swings, the robot's balance could be improved. Future research could explore integrating an online whole-body momentum controller or a model-predictive footstep planner to enhance balance.
- 2 How can the phase lag introduced by the EMA filter be reduced? While the EMA filter smooths high-frequency noise in IMU estimates, it also introduces a small, motion-speed-dependent phase lag that may affect fast motion responsiveness. Future work could explore using an adaptive Kalman filter to further reduce this effect.
- 3 How can automatic retargeting parameter adaptation be achieved for different humanoid platforms? While the current system's motion retargeting algorithm operates seamlessly on both simulation and real hardware, it may require manual parameter adjustment on different humanoid platforms. Future research could explore automatic parameter adaptation methods to enhance system generality.
- 4 How can hand manipulation be integrated to achieve more complex tasks? The current system does not integrate hand manipulation, which could be expanded using finger-level IMU data or vision-based hand tracking to support more complex tasks and operational scenarios.
- 5 How can integration with imitation learning pipelines guide neural whole-body controllers? Although the proposed system operates without learning, integration with imitation learning pipelines could guide neural whole-body controllers from teleoperated demonstrations. Future research could explore the feasibility and potential of this integration method.
Applications
Immediate Applications
Remote Operation
Enables stable, low-latency remote operation in scenarios requiring rapid response and complex actions, such as disaster recovery and hazardous environment operations.
Human-Robot Interaction
Provides a more intuitive operation experience in scenarios requiring natural, fluid human-robot interaction, such as entertainment and education.
Industrial Automation
Improves production efficiency and operational safety in industrial scenarios requiring precise control and rapid response, such as assembly lines and quality inspection.
Long-term Vision
Medical Assistance
In the future, this technology could be applied in medical assistance fields, such as remote surgery and rehabilitation training, providing more precise and personalized medical services.
Virtual Reality
In the virtual reality field, this technology can enhance user immersion and interaction experience, driving the development and application of virtual reality technology.
Abstract
Stable, low-latency whole-body teleoperation of humanoid robots is an open research challenge, complicated by kinematic mismatches between human and robot morphologies, accumulated inertial sensor noise, non-trivial control latency, and persistent sim-to-real transfer gaps. This paper presents a complete real-time whole-body teleoperation system that maps human motion, recorded with a Virdyn IMU-based full-body motion capture suit, directly onto a Unitree G1 humanoid robot. We introduce a custom motion-processing, kinematic retargeting, and control pipeline engineered for continuous, low-latency operation without any offline buffering or learning-based components. The system is first validated in simulation using the MuJoCo physics model of the Unitree G1 (sim2sim), and then deployed without modification on the physical platform (sim2real). Experimental results demonstrate stable, synchronized reproduction of a broad motion repertoire, including walking, standing, sitting, turning, bowing, and coordinated expressive full-body gestures. This work establishes a practical, scalable framework for whole-body humanoid teleoperation using commodity wearable motion capture hardware.