Paper Insights - AI Arxiv Paper Analysis

cs.NE 2606.15923

Runtime Analysis of Cartesian Genetic Programming in Evolving Boolean Functions

This paper provides the first runtime analysis of Cartesian Genetic Programming (CGP) in evolving Boolean functions, establishing bounds of O(nD^5) for conjunctions and exponential time for XOR, highlighting the impact of selection strategies.

Duc-Cuong Dang, Roman Kalkreuth, Andre Opris

2026-06-15 33

cs.NE 2606.15334

Large Language Model-Driven Cooperative Operator Ensemble Evolution for Permutation Flow Shop Scheduling

Proposes IG-DOE, a large language model-assisted cooperative operator ensemble evolution algorithm, integrating multi-operator switching to significantly improve permutation flow shop scheduling performance.

Rui Xu, Yufan Liao, Haoze Lv et al.

2026-06-13 35

cs.CV 2606.14703

Gaze Heads: How VLMs Look at What They Describe

This study identifies a small set of attention heads—gaze heads—in VLMs that causally track the current description region, enabling effective inference-time control via attention masks.

Rohit Gandikota, David Bau

2026-06-13 48

cs.CV 2606.14702

OmniVideo-100K: A Dataset for Audio-Visual Reasoning through Structured Scripts and Evidence Chains

Introduces OmniVideo-100K, a large-scale dataset with structured scripts and evidence chains, boosting audio-visual reasoning by up to 20.59%.

Xinyue Cai, Chaoyou Fu, Yi-Fan Zhang et al.

2026-06-13 33

cs.CV 2606.14701

RATS! Patches Talk Through Registers: Emergent Parts in Register Attention Transformers

Introducing RATS (Register Attention Transformers), which self-supervisedly discovers part-level structures with N learnable registers, achieving +12 mIoU on five segmentation benchmarks.

Timing Yang, Predrag Neskovic, Jansen Seheult et al.

2026-06-13 43

cs.CV 2606.14699

Instruct-Particulate: Scaling Feed-Forward 3D Object Articulation with Kinematic Control

Instruct-Particulate employs large-scale heterogeneous datasets and instruction-guided neural networks to efficiently predict 3D articulated structures, significantly improving generalization.

Ruining Li, Yuxin Yao, Matt Zhou et al.

2026-06-13 54

cs.AI 2606.14654

Abstracting Cross-Domain Action Sequences into Interpretable Workflows

WorkflowView leverages LLMs to abstract low-level action sequences into high-level activities with F1=0.90, demonstrating cross-domain generalization.

Gaurav Verma, Scott Counts

2026-06-13 39

cs.CR 2606.14629

When Good Verifiers Go Bad: Self-Improving VLMs Can Regress on New Tasks

This study reveals that self-verification in vision-language models can cause regression when verifier quality is task-specific, supported by variance theorem analysis.

Jianzhe Lin

2026-06-13 16

cs.CL 2606.14626

Characterizing Cultural Localization in AI-Generated Stories

Proposes a method combining lexical token analysis and multi-word similarity to quantify cultural localization in AI-generated stories, revealing only 9-17% of vocabulary accounts for cultural differences.

Shaily Bhatt, Supriti Vijay, Jeremiah Milbauer et al.

2026-06-13 55

cs.RO 2606.14609

Safe Reinforcement Learning of Autonomous Highway Driving: A Unified Framework for Safety and Efficiency

Proposes MoE-RM-SRL, integrating reward machines, safe distance, and sparse gating experts, achieving safe and efficient highway autonomous driving.

Chufei Yan, Zhihao Cui, Yiyan Lv et al.

2026-06-13 45

cs.NE 2606.14202

MeEvo: Metacognitive Evolution Combined with Natural Evolution for Automatic Heuristic Design

MeEvo combines natural evolution and metacognitive reflection through cyclic alternation, significantly improving search stability and solution quality on complex optimization tasks.

Zishang Qiu, Xinan Chen, Rong Qu et al.

2026-06-12 37

cs.NE 2606.13985

Co-Evolved Spiking Neural Network Ensembles via Marginal Contribution Fitness

Proposes a co-evolutionary SNN ensemble framework based on marginal contribution fitness, significantly improving multi-task performance.

Catherine Rodriquez, James Ghawaly

2026-06-12 32

cs.CV 2606.13679

InterleaveThinker: Reinforcing Agentic Interleaved Generation

InterleaveThinker employs a multi-agent framework with a planner and critic, achieving high-quality interleaved text-image generation with step-wise reinforcement learning, improving performance on benchmarks by over 50%.

Dian Zheng, Harry Lee, Manyuan Zhang et al.

2026-06-12 70

cs.CV 2606.13676

Modality Forcing for Scalable Spatial Generation

Proposes Modality Forcing, a post-training method enabling a single DiT model to jointly generate image and sparse depth data, achieving 57% reduction in AbsRel and scaling with model size.

Bardienus Pieter Duisterhof, Deva Ramanan, Jeffrey Ichnowski et al.

2026-06-12 98

cs.RO 2606.13675

Improving Robotic Generalist Policies via Flow Reversal Steering

Flow Reversal Steering (FRS) leverages reverse flow models to map coarse actions into high-quality behaviors, boosting zero-shot control and rapid learning in robotic policies.

Andy Tang, William Chen, Andrew Wagenmaker et al.

2026-06-12 67

cs.CV 2606.13673

SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

SpatialClaw employs code as an action interface, achieving 59.9% average accuracy across 20 spatial reasoning benchmarks, outperforming recent models by 11.2%.

Seokju Cho, Ryo Hachiuma, Abhishek Badki et al.

2026-06-12 155

cs.AI 2606.13670

Automated reproducibility assessments in the social and behavioral sciences using large language models

Using large language models (e.g., Claude 4.7) for automated reproducibility assessment in social sciences, matching effect sizes within ±0.05 and supporting conclusions with high accuracy.

Tobias Holtdirk, Pietro Marcolongo, Anna Steinberg Schulten et al.

2026-06-12 91

cs.LG 2606.13657

Dense Supervision, Sparse Updates: On the Sparsity and Geometry of On-Policy Distillation

This paper analyzes the sparsity and geometric structure of on-policy distillation (OPD), revealing small, coordinate-sparse updates that are spectrally concentrated and deviate from source principal directions.

Guo Yu, Wenlin Liu, Yulan Hu et al.

2026-06-12 94

cs.CV 2606.13655

Flex4DHuman: Flexible Multi-view Video Diffusion for 4D Human Reconstruction

Flex4DHuman employs relative camera-pose encoding within a diffusion framework to synthesize synchronized multi-view videos from monocular or sparse inputs, surpassing prior methods without explicit geometry priors.

Jen-Hao Cheng, Yipeng Wang, Hao Zhang et al.

2026-06-12 66

cs.CL 2606.13634

Operads for compositional reasoning in LLMs

Introduces operads as a formal framework for question decomposition, with operadic consistency correlating strongly with model accuracy across multiple datasets.

Nathaniel Bottman, Kyle Richardson

2026-06-12 1 citations 62