Paper Insights - AI Arxiv Paper Analysis

cs.CV 2605.12501

Covering Human Action Space for Computer Use: Data Synthesis and Benchmark

CUActSpot benchmark enhances GUI complex interaction performance via data synthesis and multimodal evaluation; Phi-Ground-Any-4B excels.

Miaosen Zhang, Xiaohan Zhao, Zhihong Tan et al.

2026-05-13 72

cs.CV 2605.12496

CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives

CausalCine achieves real-time multi-shot video generation using a causal autoregressive framework, significantly enhancing cross-shot coherence and interactivity.

Yihao Meng, Zichen Liu, Hao Ouyang et al.

2026-05-13 158

cs.CV 2605.12495

AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward

AlphaGRPO enhances UMMs' multimodal generation via Decompositional Verifiable Reward, significantly improving benchmarks like GenEval.

Runhui Huang, Jie Wu, Rui Yang et al.

2026-05-13 84

cs.LG 2605.12492

Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation

Pion optimizer preserves spectrum via orthogonal equivalence transformation, enhancing LLM training stability.

Kexuan Shi, Hanxuan Li, Zeju Qiu et al.

2026-05-13 81

cs.CL 2605.12493

LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues

LongMemEval-V2 achieves 72.5% accuracy with AgentRunbook-C, evaluating long-term memory in agents.

Di Wu, Zixiang Ji, Asmi Kawatkar et al.

2026-05-13 811

cs.CL 2605.12487

Task-Adaptive Embedding Refinement via Test-time LLM Guidance

Task-Adaptive Embedding Refinement via Test-time LLM Guidance improves zero-shot search and classification by up to 25%.

Ariel Gera, Shir Ashury-Tahan, Gal Bloch et al.

2026-05-13 99

cs.LG 2605.12483

Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training

Proposes a sparse-to-dense reward principle combining GRPO and OPD to enhance language model post-training.

Yuanda Xu, Hejian Sang, Zhengze Zhou et al.

2026-05-13 207

cs.AI 2605.12481

ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents

ToolCUA optimizes GUI-Tool path selection via staged training, achieving 46.85% accuracy.

Xuhao Hu, Xi Zhang, Haiyang Xu et al.

2026-05-13 217

cs.CV 2605.12480

OmniNFT: Modality-wise Omni Diffusion Reinforcement for Joint Audio-Video Generation

OmniNFT enhances audio-video generation quality and synchronization through a modality-aware online diffusion RL framework.

Guohui Zhang, XiaoXiao Ma, Jie Huang et al.

2026-05-13 194

cs.LG 2605.12477

MEME: Multi-entity & Evolving Memory Evaluation

MEME evaluates multi-entity and evolving memory tasks, exposing dependency reasoning failures in current systems.

Seokwon Jung, Alexander Rubinstein, Arnas Uselis et al.

2026-05-13 168

cs.LG 2605.12476

Routers Learn the Geometry of Their Experts: Geometric Coupling in Sparse Mixture-of-Experts

The paper introduces a parameter-free online K-Means router leveraging geometric coupling for effective expert assignment, reducing load imbalance with only a slight perplexity increase.

Sagi Ahrac, Noya Hochwald, Mor Geva

2026-05-13 79

cs.AI 2605.12474

Reward Hacking in Rubric-Based Reinforcement Learning

The study proposes a framework to diagnose reward hacking in rubric-based RL, finding that even strong verification does not eliminate reward hacking.

Anas Mahmoud, MohammadHossein Rezaei, Zihao Wang et al.

2026-05-13 223

cs.LG 2605.12471

KV-Fold: One-Step KV-Cache Recurrence for Long-Context Inference

KV-Fold: A training-free protocol for long-context inference achieving 100% exact-match retrieval.

Alireza Nadali, Patrick Cooper, Ashutosh Trivedi et al.

2026-05-13 109

cs.LG 2605.12466

Solve the Loop: Attractor Models for Language and Reasoning

Attractor Models enhance language modeling and reasoning via fixed-point solving, improving training efficiency by 46.6% and accuracy by 19.7%.

Jacob Fein-Ashley, Paria Rashidinejad

2026-05-13 268

cs.AI 2605.12462

Towards Affordable Energy: A Gymnasium Environment for Electric Utility Demand-Response Programs

DR-Gym environment optimizes electric utility demand response using reinforcement learning, enhancing grid flexibility and energy affordability.

Jose E. Aguilar Escamilla, Lingdong Zhou, Xiangqi Zhu et al.

2026-05-13 89

cs.LG 2605.12460

Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs

Multi-stream LLMs unlock language models with parallel streams of thoughts, inputs, and outputs, enhancing efficiency and security.

Guinan Su, Yanwu Yang, Xueyan Li et al.

2026-05-13 114

cs.CR 2605.12456

TextSeal: A Localized LLM Watermark for Provenance & Distillation Protection

TextSeal uses dual-key generation and entropy-weighted scoring to watermark LLMs, enhancing detection strength without distortion.

Tom Sander, Hongyan Chang, Tomáš Souček et al.

2026-05-13 194

eess.SP 2605.12453

Enabling AI-Native Mobility in 6G: A Real-World Dataset for Handover, Beam Management, and Timing Advance

Proposed AI/ML-based 6G mobility solution using real datasets to optimize handover and beam management.

Mannam Veera Narayana, Rohit Singh, Deepa M. R et al.

2026-05-13 83

cs.CL 2605.12452

The Algorithmic Caricature: Auditing LLM-Generated Political Discourse Across Crisis Events

Using a Computational Social Science framework, audit LLM-generated political discourse across nine crisis events, finding it more negative and structurally consistent.

Gunjan, Sidahmed Benabderrahmane, Talal Rahwan

2026-05-13 71

cs.CV 2605.12451

FuTCR: Future-Targeted Contrast and Repulsion for Continual Panoptic Segmentation

FuTCR framework improves new-class panoptic quality by up to 28% in continual panoptic segmentation while enhancing base-class performance.

Nicholas Ikechukwu, Keanu Nichols, Deepti Ghadiyaram et al.

2026-05-13 87