Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?
Self-distillation can degrade LLMs' reasoning in math by suppressing uncertainty expression.
Jeonghye Kim, Xufang Luo, Minbeom Kim et al.
Self-distillation can degrade LLMs' reasoning in math by suppressing uncertainty expression.
Jeonghye Kim, Xufang Luo, Minbeom Kim et al.
TiCo method significantly enhances time control in dialogue models using Spoken Time Markers, reducing MAE to 4.54 seconds.
Kai-Wei Chang, Wei-Chih Chen, En-Pei Hu et al.
MemDLM embeds a simulated denoising process into training via bi-level optimization, enhancing DLM training efficiency and long-context understanding.
Zehua Pei, Hui-Ling Zhen, Weizhe Lin et al.
Semantic Token Clustering (STC) method achieves efficient uncertainty quantification in large language models, significantly reducing computational overhead.
Qi Cao, Andrew Gambardella, Takeshi Kojima et al.
Study of SFT-DPO interaction in small models reveals full fine-tuning outperforms LoRA.
Yuming Feng, Christy Yang
F2LLM-v2 offers efficient multilingual embeddings using a two-stage training and matryoshka learning, supporting over 200 languages.
Ziyin Zhang, Zihan Liao, Hang Yu et al.
Nemotron-Cascade 2 achieves top-tier reasoning with Cascade RL and multi-domain distillation in a 30B MoE model.
Zhuolin Yang, Zihan Liu, Yang Chen et al.
VEPO enhances translation quality and tokenization efficiency for low-resource languages using reinforcement learning with verifiable rewards.
Chonghan Liu, Yimin Du, Qi An et al.
Efficient training-free multi-token prediction via embedding-space probing, improving LLaMA3 acceptance length by 12%.
Raghavv Goel, Mukul Gagrani, Mingu Lee et al.
Mixture-of-Depths Attention (MoDA) improves downstream task performance by 2.11% on a 1.5B-parameter model with only a 3.7% increase in FLOPs.
Lianghui Zhu, Yuxin Fang, Bencheng Liao et al.
Correcting moral indifference in language models using Sparse Autoencoders, achieving a 75% win-rate on adversarial benchmarks.
Lingyu Li, Yan Teng, Yingchun Wang
Code-A1 enhances code and test generation through an adversarial co-evolution framework.
Aozhe Wang, Yuchen Yan, Nan Zhou et al.
NAIT framework selects efficient instruction tuning data via neuron activation patterns, enhancing LLM performance.
Xin Chen, Junchao Wu, Shu Yang et al.
ESG-Bench significantly reduces hallucinations in long-context ESG report analysis using task-specific Chain-of-Thought prompting strategies.
Siqi Sun, Ben Peng Wu, Mali Jin et al.
WALAR method enhances low-resource language translation using monolingual data, surpassing LLaMAX model.
Yifeng Liu, Siqi Ouyang, Yatish Hosmane Revanasiddappa et al.
Proposed a PCA sweep method to optimize dimension selection in SSD, enhancing interpretability and stability.
Hubert Plisiecki, Maria Leniarska, Jan Piotrowski et al.
Long-form RewardBench evaluates reward models for long-form generation, revealing current models' deficiencies in long-form reward modeling.
Hui Huang, Yancheng He, Wei Liu et al.
HMS-BERT uses hybrid multi-task self-training for multilingual, multi-label cyberbullying detection, achieving a macro F1-score of 0.9847.
Zixin Feng, Xinying Cui, Yifan Sun et al.
Idea-Catalyst framework boosts scientific creativity via interdisciplinary insights, improving novelty by 21% and insightfulness by 16%.
Priyanka Kargupta, Shuhaib Mehri, Dilek Hakkani-Tur et al.
CLASP model detects malicious tokens using XGBoost classifier, achieving 95.9% token-level F1 score.
Alexandre Le Mercier, Thomas Demeester, Chris Develder