A Survey on Mixture of Experts
Large language models (LLMs) have garnered unprecedented advancements across diverse fields, ranging from natural language processing to computer vision and beyond. The prowess of LLMs is underpinned by their substantial model size, extensive and diverse datasets, and the vast computational power ha...
Saved in:
Main Authors | , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
26.06.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Large language models (LLMs) have garnered unprecedented advancements across
diverse fields, ranging from natural language processing to computer vision and
beyond. The prowess of LLMs is underpinned by their substantial model size,
extensive and diverse datasets, and the vast computational power harnessed
during training, all of which contribute to the emergent abilities of LLMs
(e.g., in-context learning) that are not present in small models. Within this
context, the mixture of experts (MoE) has emerged as an effective method for
substantially scaling up model capacity with minimal computation overhead,
gaining significant attention from academia and industry. Despite its growing
prevalence, there lacks a systematic and comprehensive review of the literature
on MoE. This survey seeks to bridge that gap, serving as an essential resource
for researchers delving into the intricacies of MoE. We first briefly introduce
the structure of the MoE layer, followed by proposing a new taxonomy of MoE.
Next, we overview the core designs for various MoE models including both
algorithmic and systemic aspects, alongside collections of available
open-source implementations, hyperparameter configurations and empirical
evaluations. Furthermore, we delineate the multifaceted applications of MoE in
practice, and outline some potential directions for future research. To
facilitate ongoing updates and the sharing of cutting-edge developments in MoE
research, we have established a resource repository accessible at
https://github.com/withinmiaov/A-Survey-on-Mixture-of-Experts. |
---|---|
DOI: | 10.48550/arxiv.2407.06204 |