cs.LG 2605.12477

MEME: Multi-entity & Evolving Memory Evaluation

MEME evaluates multi-entity and evolving memory tasks, exposing dependency reasoning failures in current systems.

Seokwon Jung, Alexander Rubinstein, Arnas Uselis et al.

2026-05-13 168
cs.AI 2605.12474

Reward Hacking in Rubric-Based Reinforcement Learning

The study proposes a framework to diagnose reward hacking in rubric-based RL, finding that even strong verification does not eliminate reward hacking.

Anas Mahmoud, MohammadHossein Rezaei, Zihao Wang et al.

2026-05-13 223