Paper Insights - AI Arxiv Paper Analysis

cs.LG 2606.19236

STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability

Proposes STARE, a surprisal-guided advantage reweighting method, stabilizing policy entropy and improving accuracy by 4%-8% on models from 1.5B to 32B.

Haipeng Luo, Qingfeng Sun, Songli Wu et al.

2026-06-18 36

cs.LG 2606.18933

Zero-Shot Active Feature Acquisition via LLM-Elicitation

Proposes a zero-shot active feature acquisition framework using LLM-derived discriminative statistics and MaxEnt closure, significantly improving IBD diagnosis accuracy.

Binyamin Perets, Natalie Mendelson, Shiran Vainberg et al.

2026-06-17 26

cs.CV 2606.18249

Unified Multimodal Autoregressive Modeling with Shared Context-Visual Tokenizer is Key to Unification

UniAR introduces a unified autoregressive framework with a single discrete visual tokenizer, achieving state-of-the-art results in image generation and understanding.

Wujian Peng, Lingchen Meng, Yuxuan Cai et al.

2026-06-17 34

cs.RO 2606.18247

Visual Verification Enables Inference-time Steering and Autonomous Policy Improvement

VERITAS framework uses inference-time verification with visual models to improve robot policies by 10% success rate without additional training.

Mingtong Zhang, Dhruv Shah

2026-06-17 37

cs.CV 2606.18242

EventDrive: Event Cameras for Vision-Language Driving Intelligence

EventDrive integrates event cameras with vision-language models, significantly improving perception, understanding, prediction, and planning in autonomous driving.

Dongyue Lu, Rong Li, Ao Liang et al.

2026-06-17 38

cs.LG 2606.18208

Looped World Models

Proposes LoopWM, a parameter-shared transformer with iterative latent refinement, achieving 100× parameter efficiency and stable long-horizon environment prediction.

Hongyuan Adam Lu, Z. L. Victor Wei, Qun Zhang et al.

2026-06-17 40

cs.CL 2606.18203

RubricsTree: Scalable and Evolving Open-Ended Evaluation of Personal Health Agents across Health Memory and Medical Skills

RubricsTree constructs a hierarchical Boolean rubric system guided by expert-curated clinical criteria, enabling scalable, expert-aligned evaluation with over 100 atomic metrics, surpassing industry baselines.

Weizhi Zhang, Zechen Li, Hamid Palangi et al.

2026-06-17 41

cs.AI 2606.18191

DRFLOW: A Deep Research Benchmark for Personalized Workflow Prediction

Proposed DRFLOW benchmark with 7 metrics, evaluating personalized workflow prediction across 100 tasks and 1246 steps, using multi-source evidence integration.

Md Tawkat Islam Khondaker, Raymond Li, Muhammad Abdul-Mageed et al.

2026-06-17 26

cs.CR 2606.18190

Multi-Source Cybersecurity Logs: An ATT&CK-Labeled Dataset and SLM Evaluation

Constructed a multi-source log dataset with 870 sessions, 2.3 million events, labeled with ATT&CK techniques, fine-tuned three small language models (Qwen, Llama, Phi) using LoRA, achieving up to 97% accuracy in chunk classification.

Abir Ashab Niloy, Ahmed Ryan, Imamul Hossain Rafi et al.

2026-06-17 23

cs.LG 2606.18186

Kolmogorov Regression for Robust Diffusion Policies

Introduces Kolmogorov PDE-based diffusion policies with dimension-independent convergence, improving long-horizon control in robotics and manufacturing.

Lekan Molu

2026-06-17 26

cs.SD 2606.17775

A Neuromorphic Trigger for Efficient Audio Event Detection

Proposes a lightweight neuromorphic trigger based on fully connected LIF SNN achieving 0.97 F1 on ASD and 42.6× FLOPs reduction in SED.

Benjamin Hatton, Oliver Rhodes, Luca Peres

2026-06-16 10

cs.IR 2606.17707

Do Generative Recommenders Deepen the Information Cocoon? A Closed-Loop Simulation with LLM-powered User Simulators

This study introduces RecLoop, a closed-loop simulation framework, comparing generative and traditional recommenders; findings show generative models better preserve diversity but still face cocoon effects.

Jiyuan Yang, Gengxin Sun, Mengqi Zhang et al.

2026-06-16 36

cs.CL 2606.17041

Benchmarking LLM Agents on Meta-Analysis Articles from Nature Portfolio

This study benchmarks 12 LLM pipeline configurations on MetaSyn, revealing a screening bottleneck with a maximum recall of 52.7% despite 90.9% retrieval recall at K=200.

Anzhe Xie, Weihang Su, Yujia Zhou et al.

2026-06-16 1 citations 46

cs.RO 2606.17040

R2RDreamer: 3D-aware Data Augmentation for Spatially-generalized 2D Manipulation Policies

R2RDreamer enhances 2D manipulation policies' spatial generalization via 3D-aware augmentation and occlusion-aware video completion, achieving significant performance gains with limited demonstrations.

Xiuwei Xu, Haowen Sun, Angyuan Ma et al.

2026-06-16 37

cs.CV 2606.17030

Qwen-RobotWorld Technical Report: Unifying Embodied World Modeling through Language-Conditioned Video Generation

Proposes Qwen-RobotWorld, a language-conditioned video world model using double-stream MMDiT and 8.6M embodied video-text pairs, achieving top performance on multiple benchmarks.

Jie Zhang, Xiaoyue Chen, Anzhe Chen et al.

2026-06-16 77

cs.CV 2606.17027

MeshLoom: Feed-Forward Non-Rigid Registration of Mesh Sequences

MeshLoom is a feed-forward non-rigid mesh registration network that reconstructs vertex deformations across sequences within seconds, outperforming state-of-the-art methods.

Jianqi Chen, Jiraphon Yenphraphai, Xiangjun Tang et al.

2026-06-16 62

math.ST 2606.17022

Learning the Geometry of Data: A Mathematical Review of Shape Space Analysis

Integrates Riemannian geometry and deep learning to analyze biological shape variability, improving classification and trajectory modeling.

Gary P. T. Choi, Khanh Dao Duc, Shira Faigenbaum-Golovin et al.

2026-06-16 46

cs.SD 2606.17006

TuneJury: An Open Metric for Improving Music Generation Preference Alignment

TuneJury is a pairwise preference reward model trained on 17,500 human judgments, achieving 0.7086 accuracy, outperforming non-pseudo-label models for music preference alignment.

Yonghyun Kim, Junwon Lee, Haiwen Xia et al.

2026-06-16 21

cs.IR 2606.16970

A Theoretical Framework for Risk Analysis of Stochastic Rankers

Developed a theoretical framework for reranking risk based on DCG variation, validated by experiments on TREC data showing close alignment between predicted and observed deviations.

Debasis Ganguly

2026-06-16 37

physics.optics 2606.16261

Wavelength-Multiplexed 2D Beam Steering via a Passive Diffractive Network

Deep learning-optimized multilayer passive diffractive network achieves 625 channels of 2D beam steering across 400-750nm with subwavelength accuracy.

Che-Yung Shen, Yuhang Li, Cagatay Isil et al.

2026-06-15 27