Paper Insights - AI Arxiv Paper Analysis

cs.CV 2603.15620

Towards Generalizable Robotic Manipulation in Dynamic Environments

PUMA model improves success rate by 6.3% in dynamic environments using historical optical flow and world queries.

Heng Fang, Shangru Li, Shuhan Wang et al.

2026-03-17 1 citations 316

cs.CL 2603.15619

Mixture-of-Depths Attention

Mixture-of-Depths Attention (MoDA) improves downstream task performance by 2.11% on a 1.5B-parameter model with only a 3.7% increase in FLOPs.

Lianghui Zhu, Yuxin Fang, Bencheng Liao et al.

2026-03-17 123

cs.LG 2603.15617

HorizonMath: Measuring AI Progress Toward Mathematical Discovery with Automatic Verification

HorizonMath evaluates AI progress in mathematical discovery using an automated verification framework, with GPT 5.4 Pro achieving breakthroughs on two problems.

Erik Y. Wang, Sumeet Motwani, James V. Roggeveen et al.

2026-03-17 141

cs.CV 2603.15616

GlyphPrinter: Region-Grouped Direct Preference Optimization for Glyph-Accurate Visual Text Rendering

GlyphPrinter enhances glyph accuracy using Region-Grouped Direct Preference Optimization, surpassing existing methods.

Xincheng Shuai, Ziye Li, Henghui Ding et al.

2026-03-17 1 citations 151

cs.CL 2603.15615

Mechanistic Origin of Moral Indifference in Language Models

Correcting moral indifference in language models using Sparse Autoencoders, achieving a 75% win-rate on adversarial benchmarks.

Lingyu Li, Yan Teng, Yingchun Wang

2026-03-17 125

cs.CV 2603.15614

Tri-Prompting: Video Diffusion with Unified Control over Scene, Subject, and Motion

Tri-Prompting method significantly outperforms Phantom and DaS in multi-view subject consistency and motion accuracy.

Zhenghong Zhou, Xiaohang Zhan, Zhiqin Chen et al.

2026-03-17 194

cs.CV 2603.15612

HSImul3R: Physics-in-the-Loop Reconstruction of Simulation-Ready Human-Scene Interactions

HSImul3R uses physics feedback to optimize stable human-scene interaction 3D reconstructions, significantly enhancing simulation stability.

Yukang Cao, Haozhe Xie, Fangzhou Hong et al.

2026-03-17 129

cs.CL 2603.15611

Code-A1: Adversarial Evolving of Code LLM and Test LLM via Reinforcement Learning

Code-A1 enhances code and test generation through an adversarial co-evolution framework.

Aozhe Wang, Yuchen Yan, Nan Zhou et al.

2026-03-17 2 citations 116

cs.AI 2603.15607

Do Metrics for Counterfactual Explanations Align with User Perception?

The study finds that counterfactual explanation metrics do not align with user perception, necessitating more human-centered evaluation methods.

Felix Liedeker, Basil Ell, Philipp Cimiano et al.

2026-03-17 143

cs.RO 2603.15600

From Passive Observer to Active Critic: Reinforcement Learning Elicits Process Reasoning for Robotic Manipulation

PRIMO R1 transforms video MLLMs into active 'Critics' using reinforcement learning, achieving 67.0% accuracy on RoboFail benchmark.

Yibin Liu, Yaxing Lyu, Daqi Gao et al.

2026-03-17 107

cs.AI 2603.15594

OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data

OpenSeeker democratizes frontier search agents by fully open-sourcing training data, utilizing controllable QA synthesis and denoised trajectory synthesis.

Yuwen Du, Rui Ye, Shuo Tang et al.

2026-03-17 3 citations 351

cs.LG 2603.15590

Effective Distillation to Hybrid xLSTM Architectures

Effective distillation of xLSTM architectures recovers and exceeds teacher model performance.

Lukas Hauzenberger, Niklas Schmidinger, Thomas Schmied et al.

2026-03-17 145

cs.AI 2603.15586

Computational Concept of the Psyche

Proposes a cognitive architecture viewing the psyche as an operating system for constructing AGI.

Anton Kolonin, Vladimir Krykov

2026-03-17 109

cond-mat.mtrl-sci 2603.15582

Benchmarking Machine Learning Approaches for Polarization Mapping in Ferroelectrics Using 4D-STEM

Using ResNet and VGG models for polarization mapping in 4D-STEM, achieving 99.8% accuracy on synthetic data.

Matej Martinc, Goran Dražič, Anton Kokalj et al.

2026-03-17 101

cs.SE 2603.15566

Lore: Repurposing Git Commit Messages as a Structured Knowledge Protocol for AI Coding Agents

Lore protocol repurposes git commit messages into structured knowledge using git trailers, enhancing decision records for AI coding agents.

Ivan Stetsenko

2026-03-17 196

cs.LG 2603.15563

The PokeAgent Challenge: Competitive and Long-Context Learning at Scale

PokeAgent Challenge tests AI decision-making via Pokemon battles and RPG, offering a 20M+ dataset and standardized evaluation framework.

Seth Karten, Jake Grigsby, Tersoo Upaa et al.

2026-03-17 10 citations 175

cs.LG 2603.13228

PhysMoDPO: Physically-Plausible Humanoid Motion with Preference Optimization

PhysMoDPO optimizes humanoid motion for physical realism and task performance through preference optimization.

Yangsong Zhang, Anujith Muraleedharan, Rikhat Akizhanov et al.

2026-03-14 157

cs.LG 2603.13227

Representation Learning for Spatiotemporal Physical Systems

Using Joint Embedding Predictive Architectures (JEPA) for learning representations in latent space significantly enhances parameter estimation accuracy.

Helen Qu, Rudy Morel, Michael McCabe et al.

2026-03-14 2 citations 192

cs.CV 2603.13224

Visual-ERM: Reward Modeling for Visual Equivalence

Visual-ERM enhances vision-to-code tasks with fine-grained visual rewards, significantly outperforming existing models.

Ziyu Liu, Shengyuan Ding, Xinyu Fang et al.

2026-03-14 118

cs.CV 2603.13215

Out of Sight, Out of Mind? Evaluating State Evolution in Video World Models

STEVO-Bench evaluates video world models' ability to evolve state during observation interruptions, revealing limitations.

Ziqi Ma, Mengzhan Liufu, Georgia Gkioxari

2026-03-14 152