Paper Insights - AI Arxiv Paper Analysis

cs.RO 2606.06323

VOLT: Vision and Language Trajectory Segmentation for Faster-than-Demonstration Policies

VOLT leverages vision-language models for trajectory segmentation, enabling robots to execute tasks up to 2.57× faster while maintaining success rates.

Robert Ramirez Sanchez, Daniel J. Evans, Dylan P. Losey et al.

2026-06-04 65

cs.RO 2606.06041

Sample-efficient Low-level Motion Planning for Robotic Manipulation Tasks via Zero-shot Transfer Learning

Proposes iCEM+TL framework integrating transfer learning to boost low-level robotic motion planning success rate by 23%, enabling zero-shot transfer for complex tasks.

Yuanzhi He, Victor Romero-Cano, José J. Patiño et al.

2026-06-04 62

cs.RO 2606.03985

Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking

Humanoid-GPT employs a 2B-frame large-scale motion dataset and GPT-style causal Transformer to achieve zero-shot high-dynamic motion tracking, surpassing shallow MLP trackers.

Zekun Qi, Xuchuan Chen, Dairu Liu et al.

2026-06-03 48

cs.RO 2606.03949

Preference-Calibrated Human-in-the-Loop Reinforcement Learning for Robotic Manipulation

Proposed PACT framework uses preference signals and task progress modeling to correct overestimated Q-values, boosting success rate by 24.5% and accelerating convergence 1.3× in real robot tasks.

Zeyi Liu, Guangyao Liu, Yinuo Qu et al.

2026-06-03 53

cs.RO 2606.02370

A Simulation Platform for Flapping-Wing Vehicles

Introducing FWAV-Sim, a high-fidelity Unity-based simulation platform integrating quasi-steady blade-element aerodynamics, fractal turbulence, and realistic sensor models, enabling robust FWAV autonomous system development.

Haichuan Li, Tomi Westerlund

2026-06-01 59

cs.RO 2605.31486

Learning Controlled Separation of Small Objects Between Two Fingers with a Tactile Skin

Pure tactile-based deep RL enables multi-finger robot to control small object separation with success rates close to 94%, validated in real-world transfer.

Ulf Kasolowsky, Berthold Bäuml

2026-05-30 89

cs.RO 2605.31476

IDOL: Inverse-Dynamics-Guided Future Prediction for End-to-End Autonomous Driving

IDOL employs inverse dynamics to decode future scene transitions into motion features, significantly improving autonomous driving trajectory planning.

Chenghao Zhang, Timin Li, Dongmei Li

2026-05-30 84

cs.RO 2605.31460

On-Device Robotic Planning: Eliminating Inference Redundancy for Efficient Decision-Making

REIS combines lightweight scene gating and KV-guided inference, significantly reducing robotic reasoning redundancy for real-time decision-making.

Joonhee Lee, Hyunseung Shin, Hyunmi Kim et al.

2026-05-29 73

cs.RO 2605.31405

Adaptive Artificial Time-Delay Control with Barrier Lyapunov Constraints for Euler-Lagrange Robots

Proposes an adaptive control framework combining artificial time delay estimation with barrier Lyapunov functions for Euler-Lagrange robots, effectively handling state-dependent uncertainties and time-varying constraints.

Saksham Gupta, Rishabh Dev Yadav, Sarthak Mishra et al.

2026-05-29 66

cs.RO 2605.28726

How VLAs Fail Differently: Black-Box Action Monitoring Reveals Architecture-Specific Failure Signatures

This study reveals architecture-specific failure signatures in VLA models via black-box action monitoring, emphasizing the importance of architecture-matched monitors.

Krishnam Gupta

2026-05-28 118

cs.RO 2605.22816

AwareVLN: Reasoning with Self-awareness for Vision-Language Navigation

AwareVLN introduces self-aware reasoning for VLN, achieving NE 4.02 on R2R-CE Val-Unseen, outperforming prior SOTA.

Wenxuan Guo, Xiuwei Xu, Yichen Liu et al.

2026-05-22 56

cs.RO 2605.22812

GesVLA: Gesture-Aware Vision-Language-Action Model Embedded Representations

GesVLA integrates gesture into Vision-Language-Action models, achieving 94.3% target grounding accuracy in complex real-world tasks.

Wenxuan Guo, Ziyuan Li, Meng Zhang et al.

2026-05-22 54

cs.RO 2605.22748

Superhuman Safe and Agile Racing through Multi-Agent Reinforcement Learning

League-based multi-agent RL achieves 22 m/s quadrotor racing with 50% collision reduction vs. single-agent baselines.

Ismail Geles, Leonard Bauersfeld, Markus Wulfmeier et al.

2026-05-22 78

cs.RO 2605.22600

Branch-Stochastic Model Predictive Control for Motion Planning under Multi-Modal Uncertainty with Scenario Clustering

Proposed Branch-Stochastic MPC with scenario clustering improves safety and real-time performance in multi-modal uncertainty motion planning.

Zekun Xing, Ramkrishna Chaudhari, Marion Leibold et al.

2026-05-21 110

cs.RO 2605.12386

SafeManip: A Property-Driven Benchmark for Temporal Safety Evaluation in Robotic Manipulation

SafeManip uses LTLf to evaluate temporal safety in robotic manipulation, revealing task success does not equal safe execution.

Chengyue Huang, Khang Vo Huynh, Sebastian Elbaum et al.

2026-05-13 73

cs.RO 2605.12347

Real-Time Whole-Body Teleoperation of a Humanoid Robot Using IMU-Based Motion Capture with Sim2Sim and Sim2Real Validation

Real-time whole-body teleoperation using Virdyn IMU motion capture, validated on Unitree G1 robot with Sim2Sim and Sim2Real.

Hamza Ahmed Durrani, Suleman Khan

2026-05-13 79

cs.RO 2604.24707

Passage-Aware Structural Mapping for RGB-D Visual SLAM

Introduced a passage-aware structural mapping method for RGB-D Visual SLAM to effectively detect doors and traversable openings.

Ali Tourani, Miguel Fernandez-Cortizas, Saad Ejaz et al.

2026-04-28 102

cs.RO 2604.24681

Learning Human-Intention Priors from Large-Scale Human Demonstrations for Robotic Manipulation

MoT-HRA framework learns human-intention priors from large-scale demonstrations, enhancing motion plausibility and control robustness in robotic manipulation.

Yifan Xie, YuAn Wang, Guangyu Chen et al.

2026-04-28 131

cs.RO 2604.24674

Pushing Radar Odometry Beyond the Pavement: Current Capabilities and Challenges

Radar-KISSICP and Radar-IMU improve trajectory estimation in off-road environments.

Shaunak Kolhe, Peng Jiang, Maggie Wigness et al.

2026-04-28 124

cs.RO 2604.24661

Agent-Centric Visual Reinforcement Learning under Dynamic Perturbations

ACO-MoE recovers 95.3% performance under dynamic perturbations, enhancing visual RL robustness.

Zhengru Fang, Yu Guo, Fei Liu et al.

2026-04-28 112