RobeCzech: Czech RoBERTa, a Monolingual Contextualized Language Representation Model

We present RobeCzech, a monolingual RoBERTa language representation model trained on Czech data. RoBERTa is a robustly optimized Transformer-based pretraining approach. We show that RobeCzech considerably outperforms equally-sized multilingual and Czech-trained contextualized language representation...

Full description

Saved in:

Bibliographic Details
Published in	Text, Speech, and Dialogue Vol. 12848; pp. 197 - 209
Main Authors	Straka, Milan, Náplava, Jakub, Straková, Jana, Samuel, David
Format	Book Chapter
Language	English
Published	Switzerland Springer International Publishing AG 2021 Springer International Publishing
Series	Lecture Notes in Computer Science
Subjects	Czech RoBERTa RobeCzech RoBERTa
Online Access	Get full text
ISBN	303083526X 9783030835262
ISSN	0302-9743 1611-3349
DOI	10.1007/978-3-030-83527-9_17

Cover

Loading…

More Information
Summary:	We present RobeCzech, a monolingual RoBERTa language representation model trained on Czech data. RoBERTa is a robustly optimized Transformer-based pretraining approach. We show that RobeCzech considerably outperforms equally-sized multilingual and Czech-trained contextualized language representation models, surpasses current state of the art in all five evaluated NLP tasks and reaches state-of-the-art results in four of them. The RobeCzech model is released publicly at https://hdl.handle.net/11234/1-3691 and https://huggingface.co/ufal/robeczech-base.
ISBN:	303083526X 9783030835262
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-030-83527-9_17