Paper Insights - AI Arxiv Paper Analysis

cs.LG 2605.27306

Normal Guidance is what Attention Needs

Proposed Normal Guidance regularization improves attention-based MIL slice-level localization on 4M+ CT slices, outperforming baselines.

Ethan Harvey, Dennis Johan Loevlie, Michael C. Hughes

2026-05-27 81

cs.LG 2605.26248

Unified Neural Scaling Laws

Unified Neural Scaling Law (UNSL) models multi-dimensional scaling of deep networks, improving performance extrapolation accuracy by over 10%.

Ethan Caballero, Priyank Jaini, David Krueger et al.

2026-05-26 66

cs.LG 2605.22817

Vector Policy Optimization: Training for Diversity Improves Test-Time Search

Vector Policy Optimization (VPO) trains diverse policies to improve test-time search, achieving over 20% gains on best@k metrics across multiple tasks.

Ryan Bahlous-Boldi, Isha Puri, Idan Shenfeld et al.

2026-05-22 58

cs.LG 2605.22814

Remember to be Curious: Episodic Context and Persistent Worlds for 3D Exploration

Introduces curiosity-driven 3D exploration using persistent 3D Gaussian Splatting world model and Transformer policy, achieving 74.94% 3D coverage on HM3D.

Lily Goli, Justin Kerr, Daniele Reda et al.

2026-05-22 49

cs.LG 2605.22800

The Matching Principle: A Geometric Theory of Loss Functions for Nuisance-Robust Representation Learning

The Matching Principle unifies nuisance-robust learning by estimating deployment nuisance covariance and regularizing encoder Jacobian accordingly; validated on 7B-parameter Qwen2.5-7B.

Vishal Rajput

2026-05-22 51

cs.LG 2605.12492

Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation

Pion optimizer preserves spectrum via orthogonal equivalence transformation, enhancing LLM training stability.

Kexuan Shi, Hanxuan Li, Zeju Qiu et al.

2026-05-13 78

cs.LG 2605.12483

Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training

Proposes a sparse-to-dense reward principle combining GRPO and OPD to enhance language model post-training.

Yuanda Xu, Hejian Sang, Zhengze Zhou et al.

2026-05-13 205

cs.LG 2605.12477

MEME: Multi-entity & Evolving Memory Evaluation

MEME evaluates multi-entity and evolving memory tasks, exposing dependency reasoning failures in current systems.

Seokwon Jung, Alexander Rubinstein, Arnas Uselis et al.

2026-05-13 168

cs.LG 2605.12476

Routers Learn the Geometry of Their Experts: Geometric Coupling in Sparse Mixture-of-Experts

The paper introduces a parameter-free online K-Means router leveraging geometric coupling for effective expert assignment, reducing load imbalance with only a slight perplexity increase.

Sagi Ahrac, Noya Hochwald, Mor Geva

2026-05-13 77

cs.LG 2605.12471

KV-Fold: One-Step KV-Cache Recurrence for Long-Context Inference

KV-Fold: A training-free protocol for long-context inference achieving 100% exact-match retrieval.

Alireza Nadali, Patrick Cooper, Ashutosh Trivedi et al.

2026-05-13 105

cs.LG 2605.12466

Solve the Loop: Attractor Models for Language and Reasoning

Attractor Models enhance language modeling and reasoning via fixed-point solving, improving training efficiency by 46.6% and accuracy by 19.7%.

Jacob Fein-Ashley, Paria Rashidinejad

2026-05-13 268

cs.LG 2605.12460

Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs

Multi-stream LLMs unlock language models with parallel streams of thoughts, inputs, and outputs, enhancing efficiency and security.

Guinan Su, Yanwu Yang, Xueyan Li et al.

2026-05-13 114

cs.LG 2604.24708

Scalable Hyperparameter-Divergent Ensemble Training with Automatic Learning Rate Exploration for Large Models

Proposes HDET method to improve optimization quality and generalization of large models via automatic learning rate exploration.

Hailing Cheng, Tao Huang, Chen Zhu et al.

2026-04-28 87

cs.LG 2604.24555

Efficient learning by implicit exploration in bandit problems with side observations

Efficient learning by implicit exploration in bandit problems with side observations, achieving near-optimal regret guarantees.

Tomas Kocak, Gergely Neu, Michal Valko et al.

2026-04-27 140 citations 138

cs.LG 2604.23765

Necessary and sufficient conditions for universality of Kolmogorov-Arnold networks

Kolmogorov-Arnold Networks achieve universality with a single non-affine function.

Vugar Ismailov

2026-04-26 88

cs.LG 2604.23003

Collocation-based Robust Physics Informed Neural Networks for time-dependent simulations of pollution propagation under thermal inversion conditions on Spitsbergen

Proposed a Collocation-based Robust Physics-Informed Neural Network (CRVPINN) for simulating pollution propagation under thermal inversion conditions on Spitsbergen.

Leszek Siwik, Maciej Sikora, Natalia Leszczyńska et al.

2026-04-25 78

cs.LG 2604.22753

Spend Less, Fit Better: Budget-Efficient Scaling Law Fitting via Active Experiment Selection

Budget-efficient scaling law fitting via active experiment selection achieves full dataset performance using only 10% of the budget.

Sijie Li, Shanda Li, Haowei Lin et al.

2026-04-25 170

cs.LG 2604.22730

Neural Recovery of Historical Lexical Structure in Bantu Languages from Modern Data

Using BantuMorph v7, a neural model recovers historical lexical structures in Bantu languages from modern data, confirming 90.9% noun candidates align with Proto-Bantu forms.

Hillary Mutisya, John Mugane

2026-04-25 109

cs.LG 2604.22723

Zero-Shot Morphological Discovery in Low-Resource Bantu Languages via Cross-Lingual Transfer and Unsupervised Clustering

Zero-shot morphological discovery in low-resource Bantu languages via cross-lingual transfer and unsupervised clustering.

Hillary Mutisya, John Mugane

2026-04-25 97

cs.LG 2604.22676

Operational Feature Fingerprints of Graph Datasets via a White-Box Signal-Subspace Probe

WG-SRC provides operational feature fingerprints for graph datasets using a white-box signal-subspace probe, enhancing node classification accuracy.

Yuchen Xiong, Swee Keong Yeap, Zhen Hong Ban

2026-04-25 91