AgentRVOS: Reasoning over Object Tracks for Zero-Shot Referring Video Object Segmentation
AgentRVOS combines SAM3 and MLLM for zero-shot video object segmentation, achieving leading performance.
Woojeong Jin, Jaeho Lee, Heeseong Shin et al.
AgentRVOS combines SAM3 and MLLM for zero-shot video object segmentation, achieving leading performance.
Woojeong Jin, Jaeho Lee, Heeseong Shin et al.
CSTS enhances cross-environment AI detection stability through entity-relational abstraction, addressing schema perturbation collapse.
Abdul Rahman
c-CRAB dataset evaluates code review agents' abilities; current agents solve only 40% of tasks.
Yuntong Zhang, Zhiyuan Pan, Imam Nur Bani Yusuf et al.
3DCity-LLM enhances 3D city-scale perception with a coarse-to-fine feature encoding strategy, leveraging a 1.2M-sample dataset.
Yiping Chen, Jinpeng Li, Wenyu Ke et al.
Study shows LLMs struggle with test generation under software evolution, with pass rates dropping to 66% under semantic changes.
Sabaat Haroon, Mohammad Taha Khan, Muhammad Ali Gulzar
Introduced D2TC method to successfully evade IDS in IoT networks, enhancing attack success rate.
Islam Debicha, Tayeb Kenaza, Ishak Charfi et al.
SortedRL accelerates RL training for LLMs through online length-aware scheduling, enhancing efficiency and performance.
Yiqi Zhang, Huiqiang Jiang, Xufang Luo et al.
Introduces Graph Energy Matching (GEM), surpassing discrete diffusion models in molecular graph generation.
Michal Balcerak, Suprosana Shit, Chinmay Prabhakar et al.
VideoDetective enhances long video understanding by integrating extrinsic query and intrinsic relevance, boosting VideoMME-long accuracy by 7.5%.
Ruoliu Yang, Chu Wu, Caifeng Shan et al.
UNITE achieves unified tokenization and latent diffusion with an autoencoder, reaching FID 2.12 on ImageNet.
Shivam Duggal, Xingjian Bai, Zongze Wu et al.
DualCoT-VLA enhances vision-language-action models with parallel reasoning for complex tasks, achieving state-of-the-art performance.
Zhide Zhong, Junfeng Li, Junjie He et al.
3D-Layout-R1 achieves language-guided spatial layout editing via scene graph reasoning, with a 15% IoU increase and 25% reduction in center-distance error.
Haoyu Zhen, Xiaolong Li, Yilin Zhao et al.
Scaling DoRA achieves high-rank adaptation via factored norms and fused kernels, significantly reducing memory usage and enhancing speed.
Alexandra Zelenin, Alexandra Zhuravlyova
TiCo method significantly enhances time control in dialogue models using Spoken Time Markers, reducing MAE to 4.54 seconds.
Kai-Wei Chang, Wei-Chih Chen, En-Pei Hu et al.
DexDrummer combines trajectory planning and residual RL, achieving an F1 score of 1.0.
Hung-Chieh Fang, Amber Xie, Jennifer Grannen et al.
MemDLM embeds a simulated denoising process into training via bi-level optimization, enhancing DLM training efficiency and long-context understanding.
Zehua Pei, Hui-Ling Zhen, Weizhe Lin et al.
SPA method uses carefully designed prompts to generate large-scale synthetic data for effective knowledge injection.
Kexian Tang, Jiani Wang, Shaowen Wang et al.
NMR framework addresses humanoid motion retargeting via dynamic mapping, significantly reducing joint jumps and self-collisions.
Qingrui Zhao, Kaiyue Yang, Xiyu Wang et al.
LumosX uses relational self-attention and cross-attention for personalized video generation, enhancing face-attribute alignment.
Jiazheng Xing, Fei Du, Hangjie Yuan et al.
VideoSeek actively seeks critical evidence using video logic flow, reducing frame usage by 93% and improving LVBench accuracy by 10.2 points.
Jingyang Lin, Jialian Wu, Jiang Liu et al.