cs.AI 2606.11173

The Role of Feedback Alignment in Self-Distillation

This paper introduces feedback alignment in self-distillation, comparing three feedback types; structure-aligned critique outperforms others with +16.11% accuracy.

Semih Kara, Oğuzhan Ersoy

2026-06-10 72
cs.DC 2606.11169

Piper: A Programmable Distributed Training System

Piper decouples training strategies via IR, enabling flexible multi-strategy scheduling with performance parity and efficiency gains.

Megan Frisella, Shubham Tiwari, Andy Ruan et al.

2026-06-10 59
stat.ML 2606.11156

Itô maps for any-step SDEs

Introduces Itô maps for arbitrary-step SDE sampling, enabling conditional sampling and control, enhancing diversity and efficiency.

Zhengkai Pan, Peter Potaptchik, Wenxi Yao et al.

2026-06-10 84
cs.LG 2606.11149

Efficiently Learning Drifting Halfspaces with Massart Noise

Proposes an efficient online algorithm for drifting halfspaces under Massart noise, achieving an error bound of η + ˜O(Δ^{1/3}/γ), nearly matching theoretical limits.

Mingchen Ma, Guyang Cao, Jelena Diakonikolas et al.

2026-06-10 46
cs.AI 2606.11078

A History-Aware Visually Grounded Critic for Computer Use Agents

Proposes HiViG, a history-aware visually grounded test-time framework, boosting GUI task success rates by 5.8% (Qwen3-VL-32B) and 9% (Gemini-3-Flash) through macro-action history and visual error verification.

Jaewoo Lee, Zaid Khan, Archiki Prasad et al.

2026-06-10 95
cs.LG 2606.11057

Flexible Kernels for Protein Property Prediction

This paper introduces flexible sequence kernels based on evolutionary substitution matrices, leveraging Gaussian processes for data-efficient protein property prediction, outperforming embedding-based methods.

Martin Jankowiak, Yerdos Ordabayev, Rudraksh Tuwani et al.

2026-06-10 43
cs.RO 2606.10974

Language-Driven Cost Optimization for Autonomous Driving

This paper introduces a language-driven adaptive cost optimization framework for autonomous driving, leveraging GPT-4 to interpret natural language queries and adjust MPPI control parameters in real-time.

Diego Martinez-Baselga, Khaled Mustafa, Javier Alonso-Mora

2026-06-09 54
cs.LG 2606.09821

Rethinking the Divergence Regularization in LLM RL

DRPO introduces smooth advantage-weighted quadratic regularization to improve stability and efficiency in LLM RL training, replacing hard masks with continuous gradient weights.

Jiarui Yao, Xiangxin Zhou, Penghui Qi et al.

2026-06-09 60