Multi-Turn Multi-Agent Dialogue for Collaborative Reconstruction Improves VLM Performance on Spatial Reasoning, But Only Barely
This study introduces a multi-turn multi-agent dialogue framework to evaluate VLMs in spatial reasoning, showing limited improvements mainly due to visual grounding challenges.
Chalamalasetti Kranti, Sherzod Hakimov, David Schlangen