RobeCzech: Czech RoBERTa, a Monolingual Contextualized Language Representation Model
We present RobeCzech, a monolingual RoBERTa language representation model trained on Czech data. RoBERTa is a robustly optimized Transformer-based pretraining approach. We show that RobeCzech considerably outperforms equally-sized multilingual and Czech-trained contextualized language representation...
Saved in:
Published in | Text, Speech, and Dialogue Vol. 12848; pp. 197 - 209 |
---|---|
Main Authors | , , , |
Format | Book Chapter |
Language | English |
Published |
Switzerland
Springer International Publishing AG
2021
Springer International Publishing |
Series | Lecture Notes in Computer Science |
Subjects | |
Online Access | Get full text |
ISBN | 303083526X 9783030835262 |
ISSN | 0302-9743 1611-3349 |
DOI | 10.1007/978-3-030-83527-9_17 |
Cover
Loading…
Summary: | We present RobeCzech, a monolingual RoBERTa language representation model trained on Czech data. RoBERTa is a robustly optimized Transformer-based pretraining approach. We show that RobeCzech considerably outperforms equally-sized multilingual and Czech-trained contextualized language representation models, surpasses current state of the art in all five evaluated NLP tasks and reaches state-of-the-art results in four of them. The RobeCzech model is released publicly at https://hdl.handle.net/11234/1-3691 and https://huggingface.co/ufal/robeczech-base. |
---|---|
ISBN: | 303083526X 9783030835262 |
ISSN: | 0302-9743 1611-3349 |
DOI: | 10.1007/978-3-030-83527-9_17 |