cs.SE 2603.23448

Code Review Agent Benchmark

c-CRAB dataset evaluates code review agents' abilities; current agents solve only 40% of tasks.

Yuntong Zhang, Zhiyuan Pan, Imam Nur Bani Yusuf et al.

2026-03-25 39