Regularizing Transformers With Deep Probabilistic Layers

Language models (LM) have grown with non-stop in the last decade, from sequence-to-sequence architectures to the state-of-the-art and utter attention-based Transformers. In this work, we demonstrate how the inclusion of deep generative models within BERT can bring more versatile models, able to impu...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Aurora Cobo Aguilera, Pablo Martínez Olmos, Artés-Rodríguez, Antonio, Pérez-Cruz, Fernando
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 23.08.2021
Subjects	Coders Encoders-Decoders Transformers
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Language models (LM) have grown with non-stop in the last decade, from sequence-to-sequence architectures to the state-of-the-art and utter attention-based Transformers. In this work, we demonstrate how the inclusion of deep generative models within BERT can bring more versatile models, able to impute missing/noisy words with richer text or even improve BLEU score. More precisely, we use a Gaussian Mixture Variational Autoencoder (GMVAE) as a regularizer layer and prove its effectiveness not only in Transformers but also in the most relevant encoder-decoder based LM, seq2seq with and without attention.
ISSN:	2331-8422