A Quantitative Characterization of Forgetting in Post-Training
Quantifies forgetting in generative models post-training using forward and reverse KL objectives, avoiding quality degradation.
Krishnakumar Balasubramanian, Shiva Prasad Kasiviswanathan
Quantifies forgetting in generative models post-training using forward and reverse KL objectives, avoiding quality degradation.
Krishnakumar Balasubramanian, Shiva Prasad Kasiviswanathan
EnTransformer combines Transformer with engression for superior multivariate probabilistic forecasting.
Rajdeep Pathak, Rahul Goswami, Madhurima Panja et al.
NeFTY achieves high-accuracy 3D thermal diffusion reconstruction using a differentiable physics framework, significantly improving defect localization.
Tao Zhong, Yixun Hu, Dongzhe Zheng et al.
Leech Lattice Vector Quantization (LLVQ) achieves efficient LLM compression, outperforming Quip# and QTIP.
Tycho F. A. van der Ouderaa, Mart van Baalen, Paul Whatmough et al.
Using cross-species transfer learning to enhance electrophysiology-to-transcriptomics mapping accuracy in cortical GABAergic interneurons.
Theo Schwider, Ramin Ramezani
MLP layers in Transformers perform binary routing; validated in GPT-2, removing MLP increases perplexity by 43.3%.
Peter Balogh