Towards Generalizable Robotic Manipulation in Dynamic Environments
PUMA model improves success rate by 6.3% in dynamic environments using historical optical flow and world queries.
Heng Fang, Shangru Li, Shuhan Wang et al.
PUMA model improves success rate by 6.3% in dynamic environments using historical optical flow and world queries.
Heng Fang, Shangru Li, Shuhan Wang et al.
Mixture-of-Depths Attention (MoDA) improves downstream task performance by 2.11% on a 1.5B-parameter model with only a 3.7% increase in FLOPs.
Lianghui Zhu, Yuxin Fang, Bencheng Liao et al.
HorizonMath evaluates AI progress in mathematical discovery using an automated verification framework, with GPT 5.4 Pro achieving breakthroughs on two problems.
Erik Y. Wang, Sumeet Motwani, James V. Roggeveen et al.
GlyphPrinter enhances glyph accuracy using Region-Grouped Direct Preference Optimization, surpassing existing methods.
Xincheng Shuai, Ziye Li, Henghui Ding et al.
Correcting moral indifference in language models using Sparse Autoencoders, achieving a 75% win-rate on adversarial benchmarks.
Lingyu Li, Yan Teng, Yingchun Wang
Tri-Prompting method significantly outperforms Phantom and DaS in multi-view subject consistency and motion accuracy.
Zhenghong Zhou, Xiaohang Zhan, Zhiqin Chen et al.
HSImul3R uses physics feedback to optimize stable human-scene interaction 3D reconstructions, significantly enhancing simulation stability.
Yukang Cao, Haozhe Xie, Fangzhou Hong et al.
Code-A1 enhances code and test generation through an adversarial co-evolution framework.
Aozhe Wang, Yuchen Yan, Nan Zhou et al.
The study finds that counterfactual explanation metrics do not align with user perception, necessitating more human-centered evaluation methods.
Felix Liedeker, Basil Ell, Philipp Cimiano et al.
PRIMO R1 transforms video MLLMs into active 'Critics' using reinforcement learning, achieving 67.0% accuracy on RoboFail benchmark.
Yibin Liu, Yaxing Lyu, Daqi Gao et al.
OpenSeeker democratizes frontier search agents by fully open-sourcing training data, utilizing controllable QA synthesis and denoised trajectory synthesis.
Yuwen Du, Rui Ye, Shuo Tang et al.
Effective distillation of xLSTM architectures recovers and exceeds teacher model performance.
Lukas Hauzenberger, Niklas Schmidinger, Thomas Schmied et al.
Proposes a cognitive architecture viewing the psyche as an operating system for constructing AGI.
Anton Kolonin, Vladimir Krykov
Using ResNet and VGG models for polarization mapping in 4D-STEM, achieving 99.8% accuracy on synthetic data.
Matej Martinc, Goran Dražič, Anton Kokalj et al.
Lore protocol repurposes git commit messages into structured knowledge using git trailers, enhancing decision records for AI coding agents.
Ivan Stetsenko
PokeAgent Challenge tests AI decision-making via Pokemon battles and RPG, offering a 20M+ dataset and standardized evaluation framework.
Seth Karten, Jake Grigsby, Tersoo Upaa et al.
PhysMoDPO optimizes humanoid motion for physical realism and task performance through preference optimization.
Yangsong Zhang, Anujith Muraleedharan, Rikhat Akizhanov et al.
Using Joint Embedding Predictive Architectures (JEPA) for learning representations in latent space significantly enhances parameter estimation accuracy.
Helen Qu, Rudy Morel, Michael McCabe et al.
Visual-ERM enhances vision-to-code tasks with fine-grained visual rewards, significantly outperforming existing models.
Ziyu Liu, Shengyuan Ding, Xinyu Fang et al.
STEVO-Bench evaluates video world models' ability to evolve state during observation interruptions, revealing limitations.
Ziqi Ma, Mengzhan Liufu, Georgia Gkioxari