Paper Insights - AI Arxiv Paper Analysis

cs.CL 2605.31387

Multi-Turn Multi-Agent Dialogue for Collaborative Reconstruction Improves VLM Performance on Spatial Reasoning, But Only Barely

This study introduces a multi-turn multi-agent dialogue framework to evaluate VLMs in spatial reasoning, showing limited improvements mainly due to visual grounding challenges.

Chalamalasetti Kranti, Sherzod Hakimov, David Schlangen

2026-05-29 61

cs.NE 2605.31299

Memristor-Based Spiking Neural Network Accelerator for Bio-inspired Interception Task

Memristor-based analog SNN accelerator reduces energy consumption by 12.7× and delay by 1.26×, enabling real-time edge intelligence for bio-inspired interception tasks.

Qianhou Qu, Sheng Lu, Liuting Shang et al.

2026-05-29 86

cs.LG 2605.31261

Why Linear Recurrent Memory Works in Partially Observable Reinforcement Learning

This paper establishes the theoretical foundation for linear recurrent memory units (ALF) in partially observable reinforcement learning, constructing two linear filters that precisely replicate belief dynamics.

Yike Zhao, Onno Eberhard, Malek Khammassi et al.

2026-05-29 80

cs.NE 2605.31051

Linear Ordering Problem: Time for a Change

Introduces a multi-solution optimization framework for the Linear Ordering Problem (LOP) based on recent economic data, leveraging advanced metaheuristics to enhance solution diversity and quality.

Fabrizio Fagiolo, Marco Baioletti, Valentino Santucci

2026-05-29 76

cs.CV 2605.30351

VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion

VideoMLA introduces low-rank latent KV cache, reducing memory by 92.7% for minute-scale video diffusion while maintaining high quality.

Hidir Yesiltepe, Jiazhen Hu, Tuna Han Salih Meral et al.

2026-05-29 120

cs.CV 2605.30347

NeuROK: Generative 4D Neural Object Kinematics

NeuROK employs a transformer-based encoder-decoder to learn a low-dimensional latent space for 4D object dynamics, trained on large-scale geometric trajectories, bypassing predefined physical models.

Chen Geng, Guangzhao He, Yue Gao et al.

2026-05-29 61

cs.CL 2605.30348

LLMSurgeon: Diagnosing Data Mixture of Large Language Models

LLMSurgeon formulates data mixture diagnosis as a label-shift inverse problem, achieving 94.46% accuracy on the LLMSurgeon benchmark.

Yaxin Luo, Jiacheng Cui, Xiaohan Zhao et al.

2026-05-29 84

cs.CV 2605.30346

YoCausal: How Far is Video Generation from World Model? A Causality Perspective

YoCausal employs a two-level causality benchmark using real-world videos and natural reversal, evaluating 13 SOTA video diffusion models' causal understanding via RSI and CCI metrics.

You-Zhe Xie, Yu-Hsuan Li, Jie-Ying Lee et al.

2026-05-29 72

cs.AI 2605.30345

SchGen: PCB Schematic Generation with Semantic-Grounded Code Representations

SchGen introduces a semantic-grounded code model for PCB schematic generation, achieving 82% valid circuits with 60.5% functional correctness from natural language prompts.

Qinpei Luo, Ruichun Ma, Xinyu Zhang et al.

2026-05-29 240

cs.AI 2605.30344

Tiny but Trusted: Efficient Vision-Language Reasoning for Time-Series Anomaly Detection

Proposed VisAnomReasoner fine-tuned on VisAnomBench achieves 74.30% precision and 72.17% F1 in time-series anomaly detection, surpassing baselines by over 21 and 23 points.

Xiaona Zhou, Muntasir Wahed, Tianjiao Yu et al.

2026-05-29 97

cs.CV 2605.30341

GPIC: A Giant Permissive Image Corpus for Visual Generation

Introduces GPIC, a 28 trillion-pixel large-scale image corpus with permissive licensing, to advance visual generative modeling.

Keshigeyan Chandrasegaran, Kyle Sargent, Suchir Agarwal et al.

2026-05-29 53

cs.LG 2605.30337

Efficient Test-Time Finetuning of LLMs via Convex Reconstruction and Gradient Caching

HullFT employs convex reconstruction and gradient caching for efficient test-time fine-tuning, improving speed and quality tradeoff in large language models.

Alaa Khamis, Alaa Maalouf

2026-05-29 180

cs.CL 2605.30333

COMPOSE: Composing Future Theorems from Citations and Formal Structure

Proposes COMPOSE, a dual-graph framework combining citation and formal theorem graphs, generating plausible future theorems with 108K training pairs and 47K future papers tested.

David Busbib, Michael Werman

2026-05-29 94

cs.CL 2605.30295

MedCase-Structured: A Text-to-FHIR Dataset for Benchmarking Diagnostic Reasoning in Clinically Realistic EHR Settings

Proposes MedCase-Structured, a pipeline combining LLMs and terminology validation to generate HL7 FHIR R4 clinical datasets for diagnostic reasoning, with an 82.5% success rate.

Valentina Bui Muti, Eugénie Dulout, Ziquan Fu

2026-05-29 96

cs.LG 2605.30119

Evolving Features vs Evolving Entire Trees with GP for Interpretable Survival Analysis

Combining multi-objective genetic programming with survival tree optimization, this study enhances predictive accuracy and interpretability in survival analysis, validated on two real-world datasets.

Thalea Schlender, Peter A. N. Bosman, Tanja Alderliesten

2026-05-28 59

cs.LG 2605.29543

SCOPE: A Lightweight-training LLM Framework for Air Traffic Control Readback Monitoring

SCOPE integrates a frozen LLM with an open-set plugin classifier, achieving 91.05% open-set detection accuracy and 96.63% anomaly correction in ATC readback monitoring.

Qihan Deng, Minghua Zhang, Yang Yang et al.

2026-05-28 86

cs.CV 2605.28820

From Pixels to Words -- Towards Native One-Vision Models at Scale

NEO-ov, a fully native end-to-end vision-language model, supports multi-image and video understanding with superior fine-grained perception and spatial reasoning.

Haiwen Diao, Jiahao Wang, Penghao Wu et al.

2026-05-28 96

cs.CL 2605.28814

Self-Improving Language Models with Bidirectional Evolutionary Search

Proposes Bidirectional Evolutionary Search (BES), combining forward candidate evolution with backward goal decomposition to enhance exploration and verification in language models.

Guowei Xu, Zhenting Qi, Huangyuan Su et al.

2026-05-28 167

cs.AI 2605.28807

Calibrating Conservatism for Scalable Oversight

Proposes Calibrated Collective Oversight (CCO), integrating multiple auxiliary signals with Conformal Decision Theory for online calibration, ensuring AI behavior aligns with safety targets.

William Overman, Mohsen Bayati

2026-05-28 122

cs.CV 2605.28806

Personal Visual Memory from Explicit and Implicit Evidence

VisualMem introduces a structured visual memory module integrated with text memory, achieving 95% accuracy in personal entity recall, surpassing caption-based methods by over 40%.

Viet Nguyen, Thao Nguyen, Vishal M. Patel et al.

2026-05-28 129