cs.CV 2606.17027

MeshLoom: Feed-Forward Non-Rigid Registration of Mesh Sequences

MeshLoom is a feed-forward non-rigid mesh registration network that reconstructs vertex deformations across sequences within seconds, outperforming state-of-the-art methods.

Jianqi Chen, Jiraphon Yenphraphai, Xiangjun Tang et al.

2026-06-16 62
cs.CV 2606.14703

Gaze Heads: How VLMs Look at What They Describe

This study identifies a small set of attention heads—gaze heads—in VLMs that causally track the current description region, enabling effective inference-time control via attention masks.

Rohit Gandikota, David Bau

2026-06-13 48
cs.CV 2606.13679

InterleaveThinker: Reinforcing Agentic Interleaved Generation

InterleaveThinker employs a multi-agent framework with a planner and critic, achieving high-quality interleaved text-image generation with step-wise reinforcement learning, improving performance on benchmarks by over 50%.

Dian Zheng, Harry Lee, Manyuan Zhang et al.

2026-06-12 70
cs.CV 2606.13676

Modality Forcing for Scalable Spatial Generation

Proposes Modality Forcing, a post-training method enabling a single DiT model to jointly generate image and sparse depth data, achieving 57% reduction in AbsRel and scaling with model size.

Bardienus Pieter Duisterhof, Deva Ramanan, Jeffrey Ichnowski et al.

2026-06-12 98