Paper Insights - AI Arxiv Paper Analysis

cs.CV 2603.23489

AgentRVOS: Reasoning over Object Tracks for Zero-Shot Referring Video Object Segmentation

AgentRVOS combines SAM3 and MLLM for zero-shot video object segmentation, achieving leading performance.

Woojeong Jin, Jaeho Lee, Heeseong Shin et al.

2026-03-25 233

cs.CR 2603.23459

CSTS: A Canonical Security Telemetry Substrate for AI-Native Cyber Detection

CSTS enhances cross-environment AI detection stability through entity-relational abstraction, addressing schema perturbation collapse.

Abdul Rahman

2026-03-25 84

cs.SE 2603.23448

Code Review Agent Benchmark

c-CRAB dataset evaluates code review agents' abilities; current agents solve only 40% of tasks.

Yuntong Zhang, Zhiyuan Pan, Imam Nur Bani Yusuf et al.

2026-03-25 118

cs.CV 2603.23447

3DCity-LLM: Empowering Multi-modality Large Language Models for 3D City-scale Perception and Understanding

3DCity-LLM enhances 3D city-scale perception with a coarse-to-fine feature encoding strategy, leveraging a 1.2M-sample dataset.

Yiping Chen, Jinpeng Li, Wenyu Ke et al.

2026-03-25 94

cs.SE 2603.23443

Evaluating LLM-Based Test Generation Under Software Evolution

Study shows LLMs struggle with test generation under software evolution, with pass rates dropping to 66% under semantic changes.

Sabaat Haroon, Mohammad Taha Khan, Muhammad Ali Gulzar

2026-03-25 138

cs.CR 2603.23438

Targeted Adversarial Traffic Generation : Black-box Approach to Evade Intrusion Detection Systems in IoT Networks

Introduced D2TC method to successfully evade IDS in IoT networks, enhancing attack success rate.

Islam Debicha, Tayeb Kenaza, Ishak Charfi et al.

2026-03-25 83

cs.LG 2603.23414

SortedRL: Accelerating RL Training for LLMs through Online Length-Aware Scheduling

SortedRL accelerates RL training for LLMs through online length-aware scheduling, enhancing efficiency and performance.

Yiqi Zhang, Huiqiang Jiang, Xufang Luo et al.

2026-03-25 7 citations 180

cs.LG 2603.23398

Graph Energy Matching: Transport-Aligned Energy-Based Modeling for Graph Generation

Introduces Graph Energy Matching (GEM), surpassing discrete diffusion models in molecular graph generation.

Michal Balcerak, Suprosana Shit, Chinmay Prabhakar et al.

2026-03-25 100

cs.CV 2603.22285

VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understanding

VideoDetective enhances long video understanding by integrating extrinsic query and intrinsic relevance, boosting VideoMME-long accuracy by 7.5%.

Ruoliu Yang, Chu Wu, Caifeng Shan et al.

2026-03-24 93

cs.CV 2603.22283

End-to-End Training for Unified Tokenization and Latent Denoising

UNITE achieves unified tokenization and latent diffusion with an autoencoder, reaching FID 2.12 on ImageNet.

Shivam Duggal, Xingjian Bai, Zongze Wu et al.

2026-03-24 84

cs.CV 2603.22280

DualCoT-VLA: Visual-Linguistic Chain of Thought via Parallel Reasoning for Vision-Language-Action Models

DualCoT-VLA enhances vision-language-action models with parallel reasoning for complex tasks, achieving state-of-the-art performance.

Zhide Zhong, Junfeng Li, Junjie He et al.

2026-03-24 9 citations 292

cs.CV 2603.22279

3D-Layout-R1: Structured Reasoning for Language-Instructed Spatial Editing

3D-Layout-R1 achieves language-guided spatial layout editing via scene graph reasoning, with a 15% IoU increase and 25% reduction in center-distance error.

Haoyu Zhen, Xiaolong Li, Yilin Zhao et al.

2026-03-24 1 citations 170

cs.LG 2603.22276

Scaling DoRA: High-Rank Adaptation via Factored Norms and Fused Kernels

Scaling DoRA achieves high-rank adaptation via factored norms and fused kernels, significantly reducing memory usage and enhancing speed.

Alexandra Zelenin, Alexandra Zhuravlyova

2026-03-24 87

cs.CL 2603.22267

TiCo: Time-Controllable Training for Spoken Dialogue Models

TiCo method significantly enhances time control in dialogue models using Spoken Time Markers, reducing MAE to 4.54 seconds.

Kai-Wei Chang, Wei-Chih Chen, En-Pei Hu et al.

2026-03-24 136

cs.RO 2603.22263

DexDrummer: In-Hand, Contact-Rich, and Long-Horizon Dexterous Robot Drumming

DexDrummer combines trajectory planning and residual RL, achieving an F1 score of 1.0.

Hung-Chieh Fang, Amber Xie, Jennifer Grannen et al.

2026-03-24 103

cs.CL 2603.22241

MemDLM: Memory-Enhanced DLM Training

MemDLM embeds a simulated denoising process into training via bi-level optimization, enhancing DLM training efficiency and long-context understanding.

Zehua Pei, Hui-Ling Zhen, Weizhe Lin et al.

2026-03-24 94

cs.LG 2603.22213

SPA: A Simple but Tough-to-Beat Baseline for Knowledge Injection

SPA method uses carefully designed prompts to generate large-scale synthetic data for effective knowledge injection.

Kexian Tang, Jiani Wang, Shaowen Wang et al.

2026-03-24 108

cs.RO 2603.22201

Make Tracking Easy: Neural Motion Retargeting for Humanoid Whole-body Control

NMR framework addresses humanoid motion retargeting via dynamic mapping, significantly reducing joint jumps and self-collisions.

Qingrui Zhao, Kaiyue Yang, Xiyu Wang et al.

2026-03-24 3 citations 165

cs.CV 2603.20192

LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation

LumosX uses relational self-attention and cross-attention for personalized video generation, enhancing face-attribute alignment.

Jiazheng Xing, Fei Du, Hangjie Yuan et al.

2026-03-21 3 citations 110

cs.CV 2603.20185

VideoSeek: Long-Horizon Video Agent with Tool-Guided Seeking

VideoSeek actively seeks critical evidence using video logic flow, reducing frame usage by 93% and improving LVBench accuracy by 10.2 points.

Jingyang Lin, Jialian Wu, Jiang Liu et al.

2026-03-21 2 citations 114