Normal Guidance is what Attention Needs
Proposed Normal Guidance regularization improves attention-based MIL slice-level localization on 4M+ CT slices, outperforming baselines.
Ethan Harvey, Dennis Johan Loevlie, Michael C. Hughes
Proposed Normal Guidance regularization improves attention-based MIL slice-level localization on 4M+ CT slices, outperforming baselines.
Ethan Harvey, Dennis Johan Loevlie, Michael C. Hughes
Unified Neural Scaling Law (UNSL) models multi-dimensional scaling of deep networks, improving performance extrapolation accuracy by over 10%.
Ethan Caballero, Priyank Jaini, David Krueger et al.
Vector Policy Optimization (VPO) trains diverse policies to improve test-time search, achieving over 20% gains on best@k metrics across multiple tasks.
Ryan Bahlous-Boldi, Isha Puri, Idan Shenfeld et al.
Introduces curiosity-driven 3D exploration using persistent 3D Gaussian Splatting world model and Transformer policy, achieving 74.94% 3D coverage on HM3D.
Lily Goli, Justin Kerr, Daniele Reda et al.
The Matching Principle unifies nuisance-robust learning by estimating deployment nuisance covariance and regularizing encoder Jacobian accordingly; validated on 7B-parameter Qwen2.5-7B.
Vishal Rajput
Pion optimizer preserves spectrum via orthogonal equivalence transformation, enhancing LLM training stability.
Kexuan Shi, Hanxuan Li, Zeju Qiu et al.
Proposes a sparse-to-dense reward principle combining GRPO and OPD to enhance language model post-training.
Yuanda Xu, Hejian Sang, Zhengze Zhou et al.
MEME evaluates multi-entity and evolving memory tasks, exposing dependency reasoning failures in current systems.
Seokwon Jung, Alexander Rubinstein, Arnas Uselis et al.
The paper introduces a parameter-free online K-Means router leveraging geometric coupling for effective expert assignment, reducing load imbalance with only a slight perplexity increase.
Sagi Ahrac, Noya Hochwald, Mor Geva
KV-Fold: A training-free protocol for long-context inference achieving 100% exact-match retrieval.
Alireza Nadali, Patrick Cooper, Ashutosh Trivedi et al.
Attractor Models enhance language modeling and reasoning via fixed-point solving, improving training efficiency by 46.6% and accuracy by 19.7%.
Jacob Fein-Ashley, Paria Rashidinejad
Multi-stream LLMs unlock language models with parallel streams of thoughts, inputs, and outputs, enhancing efficiency and security.
Guinan Su, Yanwu Yang, Xueyan Li et al.
Proposes HDET method to improve optimization quality and generalization of large models via automatic learning rate exploration.
Hailing Cheng, Tao Huang, Chen Zhu et al.
Efficient learning by implicit exploration in bandit problems with side observations, achieving near-optimal regret guarantees.
Tomas Kocak, Gergely Neu, Michal Valko et al.
Kolmogorov-Arnold Networks achieve universality with a single non-affine function.
Vugar Ismailov
Proposed a Collocation-based Robust Physics-Informed Neural Network (CRVPINN) for simulating pollution propagation under thermal inversion conditions on Spitsbergen.
Leszek Siwik, Maciej Sikora, Natalia Leszczyńska et al.
Budget-efficient scaling law fitting via active experiment selection achieves full dataset performance using only 10% of the budget.
Sijie Li, Shanda Li, Haowei Lin et al.
Using BantuMorph v7, a neural model recovers historical lexical structures in Bantu languages from modern data, confirming 90.9% noun candidates align with Proto-Bantu forms.
Hillary Mutisya, John Mugane
Zero-shot morphological discovery in low-resource Bantu languages via cross-lingual transfer and unsupervised clustering.
Hillary Mutisya, John Mugane
WG-SRC provides operational feature fingerprints for graph datasets using a white-box signal-subspace probe, enhancing node classification accuracy.
Yuchen Xiong, Swee Keong Yeap, Zhen Hong Ban