Paper Insights - AI Arxiv Paper Analysis

cs.AI 2606.20526

DeepSWIP: Quotient-WMC Counterfactuals for Neural Probabilistic Logic Programs

DeepSWIP leverages neural materialization and quotient WMC for exact single-world counterfactual inference in neural probabilistic logic programs, achieving a 2.14× speedup.

Saimun Habib, Vaishak Belle, Fengxiang He

2026-06-19 33

cs.AI 2606.19911

Multi-Agent Transactive Memory

Proposed Multi-Agent Transactive Memory (MATM) enhances heterogeneous agent populations by sharing trajectories, improving success rate by 8% and reducing steps by 0.59 in interactive tasks.

To Eun Kim, Xuhong He, Dishank Jain et al.

2026-06-18 20

cs.AI 2606.18191

DRFLOW: A Deep Research Benchmark for Personalized Workflow Prediction

Proposed DRFLOW benchmark with 7 metrics, evaluating personalized workflow prediction across 100 tasks and 1246 steps, using multi-source evidence integration.

Md Tawkat Islam Khondaker, Raymond Li, Muhammad Abdul-Mageed et al.

2026-06-17 28

cs.AI 2606.14654

Abstracting Cross-Domain Action Sequences into Interpretable Workflows

WorkflowView leverages LLMs to abstract low-level action sequences into high-level activities with F1=0.90, demonstrating cross-domain generalization.

Gaurav Verma, Scott Counts

2026-06-13 41

cs.AI 2606.13670

Automated reproducibility assessments in the social and behavioral sciences using large language models

Using large language models (e.g., Claude 4.7) for automated reproducibility assessment in social sciences, matching effect sizes within ±0.05 and supporting conclusions with high accuracy.

Tobias Holtdirk, Pietro Marcolongo, Anna Steinberg Schulten et al.

2026-06-12 93

cs.AI 2606.11173

The Role of Feedback Alignment in Self-Distillation

This paper introduces feedback alignment in self-distillation, comparing three feedback types; structure-aligned critique outperforms others with +16.11% accuracy.

Semih Kara, Oğuzhan Ersoy

2026-06-10 72

cs.AI 2606.11078

A History-Aware Visually Grounded Critic for Computer Use Agents

Proposes HiViG, a history-aware visually grounded test-time framework, boosting GUI task success rates by 5.8% (Qwen3-VL-32B) and 9% (Gemini-3-Flash) through macro-action history and visual error verification.

Jaewoo Lee, Zaid Khan, Archiki Prasad et al.

2026-06-10 95

cs.AI 2606.07489

How AI Agents Reshape Knowledge Work: Autonomy, Efficiency, and Scope

Using a task-based framework, real-world data from Perplexity shows AI agents significantly boost automation, efficiency, and task scope, with productivity gains of up to 87%.

Jeremy Yang, Kate Zyskowski, Noah Yonack et al.

2026-06-06 117

cs.AI 2606.06473

MLEvolve: A Self-Evolving Framework for Automated Machine Learning Algorithm Discovery

MLEvolve is a self-evolving multi-agent framework using LLMs for end-to-end machine learning algorithm discovery, achieving 65.3% medal rate within 12 hours.

Shangheng Du, Xiangchao Yan, Jinxin Shi et al.

2026-06-05 79

cs.AI 2606.02530

SafeSteer: Localized On-Policy Distillation for Efficient Safety Alignment

SafeSteer employs localized on-policy distillation focusing on safety tokens, reducing reliance on large datasets and auxiliary reward models, achieving a superior safety-capability trade-off.

Hao Li, Jingkun An, Zijun Song et al.

2026-06-02 118

cs.AI 2606.02484

Iteris: Agentic Research Loops for Computational Mathematics

Iteris employs an explore-plan-execute loop with multi-agent collaboration to generate numerical evidence and proof drafts, verified through expert review, advancing open problems in computational mathematics.

Leheng Chen, Zihao Liu, Wanyi He et al.

2026-06-02 192

cs.AI 2605.31581

Choosing the Lens: Strategic Perspective Activation in Context-Dependent Argumentation

Introduces context-dependent argumentation frameworks (CDAFs) with perspective activation, analyzing the complexity of strategic attack manipulation, establishing NP-completeness bounds.

Albert Sadowski, Jarosław A. Chudziak

2026-05-30 76

cs.AI 2605.30344

Tiny but Trusted: Efficient Vision-Language Reasoning for Time-Series Anomaly Detection

Proposed VisAnomReasoner fine-tuned on VisAnomBench achieves 74.30% precision and 72.17% F1 in time-series anomaly detection, surpassing baselines by over 21 and 23 points.

Xiaona Zhou, Muntasir Wahed, Tianjiao Yu et al.

2026-05-29 97

cs.AI 2605.30345

SchGen: PCB Schematic Generation with Semantic-Grounded Code Representations

SchGen introduces a semantic-grounded code model for PCB schematic generation, achieving 82% valid circuits with 60.5% functional correctness from natural language prompts.

Qinpei Luo, Ruichun Ma, Xinyu Zhang et al.

2026-05-29 240

cs.AI 2605.28807

Calibrating Conservatism for Scalable Oversight

Proposes Calibrated Collective Oversight (CCO), integrating multiple auxiliary signals with Conformal Decision Theory for online calibration, ensuring AI behavior aligns with safety targets.

William Overman, Mohsen Bayati

2026-05-28 122

cs.AI 2605.27366

MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation

MUSE-Autoskill improves task success to 68.4% via unified skill lifecycle management and cross-agent skill transfer.

Huawei Lin, Peng Li, Jie Song et al.

2026-05-27 394

cs.AI 2605.27361

Natural Language Query to Configuration for Retrieval Agents

BRANE uses LLM-extracted query features to dynamically optimize retrieval configurations, achieving up to 89% cost savings on MuSiQue and others.

Melissa Z. Pan, Negar Arabzadeh, Mathew Jacob et al.

2026-05-27 64

cs.AI 2605.22794

MOSS: Self-Evolution through Source-Level Rewriting in Autonomous Agent Systems

MOSS enables source-level self-rewriting in autonomous agents, boosting OpenClaw’s four-task mean grader score from 0.25 to 0.61 in one cycle.

Qianshu Cai, Yonggang Zhang, Xianzhang Jia et al.

2026-05-22 593

cs.AI 2605.22786

LCGuard: Latent Communication Guard for Safe KV Sharing in Multi-Agent Systems

LCGuard uses adversarially learned transformations on Transformer KV caches to reduce sensitive information reconstruction in multi-agent systems while preserving task performance.

Sadia Asif, Mohammad Mohammadi Amiri, Momin Abbas et al.

2026-05-22 316

cs.AI 2605.12481

ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents

ToolCUA optimizes GUI-Tool path selection via staged training, achieving 46.85% accuracy.

Xuhao Hu, Xi Zhang, Haiyang Xu et al.

2026-05-13 216