Vector Policy Optimization: Training for Diversity Improves Test-Time Search
Vector Policy Optimization (VPO) trains diverse policies to improve test-time search, achieving over 20% gains on best@k metrics across multiple tasks.
Ryan Bahlous-Boldi, Isha Puri, Idan Shenfeld et al.