cs.AI 2605.12474

Reward Hacking in Rubric-Based Reinforcement Learning

The study proposes a framework to diagnose reward hacking in rubric-based RL, finding that even strong verification does not eliminate reward hacking.

Anas Mahmoud, MohammadHossein Rezaei, Zihao Wang et al.

2026-05-13 220