Scalable Hyperparameter-Divergent Ensemble Training with Automatic Learning Rate Exploration for Large Models
Proposes HDET method to improve optimization quality and generalization of large models via automatic learning rate exploration.
Hailing Cheng, Tao Huang, Chen Zhu et al.