Sparse-MLP

Mixture-of-Experts (MoE) architecture, conditional computing, cross-token modeling, Sparse-MLP model

BibTex: