Lightweight Adaptive Mixture of Neural and N-gram Language Models
It is often the case that the best performing language model is an ensemble of a neural language model with n-grams. In this work, we propose a method to improve how these two models are combined. By using a small network which predicts the mixture weight between the two models, we adapt their relat...
Saved in:
Main Authors | , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
20.04.2018
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | It is often the case that the best performing language model is an ensemble
of a neural language model with n-grams. In this work, we propose a method to
improve how these two models are combined. By using a small network which
predicts the mixture weight between the two models, we adapt their relative
importance at each time step. Because the gating network is small, it trains
quickly on small amounts of held out data, and does not add overhead at scoring
time. Our experiments carried out on the One Billion Word benchmark show a
significant improvement over the state of the art ensemble without retraining
of the basic modules. |
---|---|
DOI: | 10.48550/arxiv.1804.07705 |