GOVERN: Gradient Orientation Vote Ensemble for Multi-Teacher Reinforced Distillation
Pre-trained language models have become an integral component of question-answering systems, achieving remarkable performance. However, for practical deployment, it is crucial to perform knowledge distillation to maintain high performance while operating under computational constraints. In this pape...
Saved in:
Main Authors | , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
06.05.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Pre-trained language models have become an integral component of
question-answering systems, achieving remarkable performance. However, for
practical deployment, it is crucial to perform knowledge distillation to
maintain high performance while operating under computational constraints. In
this paper, we address a key question: given the importance of unsupervised
distillation for student model performance, how can knowledge from multiple
teacher models be effectively ensemble during this stage without the guidance
of labels? We propose a novel algorithm, GOVERN, to tackle this issue. GOVERN
has demonstrated significant improvements in both offline and online
experiments, enabling the student model to achieve results comparable to that
of teacher ensembles. Our experiments show that GOVERN remarkably requires a
mere 1\% of the ensemble method's inference budget to achieve 99.5\% of
performance. The proposed algorithm has been successfully deployed in a
real-world commercial question-answering system, demonstrating its real-world
applicability. |
---|---|
DOI: | 10.48550/arxiv.2405.03764 |