cs.CL 2603.22241

MemDLM: Memory-Enhanced DLM Training

MemDLM embeds a simulated denoising process into training via bi-level optimization, enhancing DLM training efficiency and long-context understanding.

Zehua Pei, Hui-Ling Zhen, Weizhe Lin et al.

2026-03-24 94
cs.CL 2603.15619

Mixture-of-Depths Attention

Mixture-of-Depths Attention (MoDA) improves downstream task performance by 2.11% on a 1.5B-parameter model with only a 3.7% increase in FLOPs.

Lianghui Zhu, Yuxin Fang, Bencheng Liao et al.

2026-03-17 123