Multilayer neural network language model training method and device based on knowledge distillation

The invention discloses a multilayer neural network language model training method and device based on knowledge distillation. The method comprises the steps that firstly, a BERT language model and amulti-layer BILSTM model are constructed to serve as a teacher model and a student model, the constru...

Full description

Saved in:

Bibliographic Details
Main Authors	LI WENTING, ZHU QUANYIN, YAO NINGBO, YU KUN, CHEN XIAOBING, LI WEI, ZHOU HONG, ZHANG ZHENGWEI, XIANG LIN, GAO SHANGBING, WANG TONGYANG
Format	Patent
Language	Chinese English
Published	01.09.2020
Subjects	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING PHYSICS
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The invention discloses a multilayer neural network language model training method and device based on knowledge distillation. The method comprises the steps that firstly, a BERT language model and amulti-layer BILSTM model are constructed to serve as a teacher model and a student model, the constructed BERT language model comprises six layers of transformers, and the multi-layer BILSTM model comprises three layers of BILSTM networks; then, after the text corpus set is preprocessed, the BERT language model is trained to obtain a trained teacher model; and the preprocessed text corpus set is input into a multilayer BILSTM model to train a student model based on a knowledge distillation technology, and different spatial representations are calculated through linear transformation when an embedding layer, a hiding layer and an output layer in a teacher model are learned. Based on the trained student model, the text can be subjected to vector conversion, and then a downstream network is trained to better classify
Bibliography:	Application Number: CN202010322267