RobeCzech: Czech RoBERTa, a Monolingual Contextualized Language Representation Model

We present RobeCzech, a monolingual RoBERTa language representation model trained on Czech data. RoBERTa is a robustly optimized Transformer-based pretraining approach. We show that RobeCzech considerably outperforms equally-sized multilingual and Czech-trained contextualized language representation...

Full description

Saved in:
Bibliographic Details
Published inText, Speech, and Dialogue Vol. 12848; pp. 197 - 209
Main Authors Straka, Milan, Náplava, Jakub, Straková, Jana, Samuel, David
Format Book Chapter
LanguageEnglish
Published Switzerland Springer International Publishing AG 2021
Springer International Publishing
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text
ISBN303083526X
9783030835262
ISSN0302-9743
1611-3349
DOI10.1007/978-3-030-83527-9_17

Cover

Loading…
More Information
Summary:We present RobeCzech, a monolingual RoBERTa language representation model trained on Czech data. RoBERTa is a robustly optimized Transformer-based pretraining approach. We show that RobeCzech considerably outperforms equally-sized multilingual and Czech-trained contextualized language representation models, surpasses current state of the art in all five evaluated NLP tasks and reaches state-of-the-art results in four of them. The RobeCzech model is released publicly at https://hdl.handle.net/11234/1-3691 and https://huggingface.co/ufal/robeczech-base.
ISBN:303083526X
9783030835262
ISSN:0302-9743
1611-3349
DOI:10.1007/978-3-030-83527-9_17