A large-scaled corpus for assessing text readability

This paper introduces the CommonLit Ease of Readability (CLEAR) corpus, which provides unique readability scores for ~ 5000 text excerpts along with information about the excerpt’s year of publishing, genre, and other metadata. The CLEAR corpus will provide researchers interested in discourse proces...

Full description

Saved in:

Bibliographic Details
Published in	Behavior research methods Vol. 55; no. 2; pp. 491 - 507
Main Authors	Crossley, Scott, Heintz, Aron, Choi, Joon Suh, Batchelor, Jordan, Karimi, Mehrnoush, Malatinszky, Agnes
Format	Journal Article
Language	English
Published	New York Springer US 01.02.2023 Springer Nature B.V
Subjects	Behavioral Science and Psychology Cognitive Psychology Comprehension Humans Psychology Publishing Reading Reproducibility of Results Writing Natural language processing Readability Corpus linguistics Readability formulas
Online Access	Get full text
ISSN	1554-3528 1554-351X 1554-3528
DOI	10.3758/s13428-022-01802-x

Cover

Loading…

More Information
Summary:	This paper introduces the CommonLit Ease of Readability (CLEAR) corpus, which provides unique readability scores for ~ 5000 text excerpts along with information about the excerpt’s year of publishing, genre, and other metadata. The CLEAR corpus will provide researchers interested in discourse processing and reading with a resource from which to develop and test readability metrics and to model text readability. The CLEAR corpus includes a number of improvements in comparison to previous readability corpora including size, breadth of the excerpts available, which cover over 250 years of writing in two different genres, and unique readability criterion provided for each text based on teachers’ ratings of text difficulty for student readers. This paper discusses the development of the corpus and presents reliability metrics for the human ratings of readability.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1554-3528 1554-351X 1554-3528
DOI:	10.3758/s13428-022-01802-x