gyxxyg / emoe Goto Github PK
View Code? Open in Web Editor NEWThis project forked from qiuzh20/emoe
Official PyTorch Implementation of EMoE: Emergent Mixture-of-Experts: Can Dense Pre-trained Transformers Benefit from Emergent Modular Structures?
License: MIT License