A Data Set of Final Year High School Examination Texts of South African Home and First Additional Language Subjects

This article describes a data set of reading comprehension and summary writing texts that were used in final-year high school examinations in South Africa between 2008 and 2020. It contains texts for eleven official South African languages. PDF versions of the texts stem from South Africa’s Departme...

Full description

Saved in:
Bibliographic Details
Published inJournal of open humanities data Vol. 9; p. 9
Main Authors Sibeko, Johannes, van Zaanen, Menno
Format Journal Article
LanguageEnglish
Published Ubiquity Press 05.07.2023
Subjects
Online AccessGet full text

Cover

Loading…
Abstract This article describes a data set of reading comprehension and summary writing texts that were used in final-year high school examinations in South Africa between 2008 and 2020. It contains texts for eleven official South African languages. PDF versions of the texts stem from South Africa’s Department of Basic Education’s online public access repository. Plain text is extracted from the PDFs and the texts are tokenized. The data set contains 429 full-text files with 929 manually extracted comprehension and summary writing texts. The data is useful for studies investigating, e.g., linguistic properties, text readability, text properties, and linguistic complexity in any of the eleven languages. Furthermore, both intra-language and inter-language comparisons or investigations can be made.
AbstractList This article describes a data set of reading comprehension and summary writing texts that were used in final-year high school examinations in South Africa between 2008 and 2020. It contains texts for eleven official South African languages. PDF versions of the texts stem from South Africa’s Department of Basic Education’s online public access repository. Plain text is extracted from the PDFs and the texts are tokenized. The data set contains 429 full-text files with 929 manually extracted comprehension and summary writing texts. The data is useful for studies investigating, e.g., linguistic properties, text readability, text properties, and linguistic complexity in any of the eleven languages. Furthermore, both intra-language and inter-language comparisons or investigations can be made.
Author van Zaanen, Menno
Sibeko, Johannes
Author_xml – sequence: 1
  givenname: Johannes
  orcidid: 0000-0003-3586-7491
  surname: Sibeko
  fullname: Sibeko, Johannes
– sequence: 2
  givenname: Menno
  orcidid: 0000-0003-1841-2444
  surname: van Zaanen
  fullname: van Zaanen, Menno
BookMark eNpN0EtLQkEUB_AhCjIT-gizbKOdeXnHpZimILTQoFaXc-ehV_ROzIxg375rRrQ6T36L_x25bkLjCHlgMFBCyKdd2NoBA31FOhzUqC81e7_-19-SXko7AOCaAYNRh6QxfcaMdOUyDZ7O6gb39MNhpPN6s6Ursw1hT6cnPLSXXIeGrt0pp_PvKhzzlo59rA02dB4OjmJjWyKmTMfW1uf3VltisznixtHVsdo5k9M9ufG4T673W7vkbTZdT-b95evLYjJe9o3gRe4jgGWmUNJoLkeMGcutL7TRWikrOIJSTldMFh64ME4WovC-AjHkTGmuh6JLFhfXBtyVn7E-YPwqA9blzyLETYkx12bvSlV5GIIroEAuK28rZ5wW2nE2lJIb2VqPF8vEkFJ0_s9jUJ6zL8_Zt4MW32wjd6M
Cites_doi 10.1162/COLI_a_00255
10.1075/itl.165.2.01col
10.1080/02572117.2015.1113000
10.2989/16073614.2023.2185984
10.3233/JIFS-169489
ContentType Journal Article
DBID AAYXX
CITATION
DOA
DOI 10.5334/johd.108
DatabaseName CrossRef
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals (WRLC)
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
EISSN 2059-481X
EndPage 9
ExternalDocumentID oai_doaj_org_article_5bf060e707a24bfdbece838e216442c4
10_5334_johd_108
GroupedDBID .0O
AAFWJ
AAPRH
AAYXX
ACCQO
AFPKN
ALMA_UNASSIGNED_HOLDINGS
CITATION
GROUPED_DOAJ
H13
HMHOC
IAO
IFM
ITC
M~E
ID FETCH-LOGICAL-c327t-a00d1c754c824911cd2df78c8855d32a055e8b147f023ce4737ffb03621582863
IEDL.DBID DOA
ISSN 2059-481X
IngestDate Wed Aug 27 01:28:49 EDT 2025
Tue Jul 01 03:15:01 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c327t-a00d1c754c824911cd2df78c8855d32a055e8b147f023ce4737ffb03621582863
ORCID 0000-0003-3586-7491
0000-0003-1841-2444
OpenAccessLink https://doaj.org/article/5bf060e707a24bfdbece838e216442c4
PageCount 1
ParticipantIDs doaj_primary_oai_doaj_org_article_5bf060e707a24bfdbece838e216442c4
crossref_primary_10_5334_johd_108
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 20230705
PublicationDateYYYYMMDD 2023-07-05
PublicationDate_xml – month: 07
  year: 2023
  text: 20230705
  day: 05
PublicationDecade 2020
PublicationTitle Journal of open humanities data
PublicationYear 2023
Publisher Ubiquity Press
Publisher_xml – name: Ubiquity Press
References (key20230705074451_B13) 2021; 37
Department of Basic Education (key20230705074451_B6) 2011
key20230705074451_B8
(key20230705074451_B15) 2021; 03
(key20230705074451_B9) 2012
(key20230705074451_B1) 2014; 165
Department of Basic Education (key20230705074451_B5) 2011
(key20230705074451_B3) 2015
(key20230705074451_B16) 2018; 34
(key20230705074451_B14) 2023
(key20230705074451_B4) 2016; 42
(key20230705074451_B11) 2023; 41
(key20230705074451_B12) 2023
(key20230705074451_B2) 2014; 5
(key20230705074451_B10) 2015; 35
Department of Basic Education (key20230705074451_B7) 2011
References_xml – start-page: 36
  year: 2015
  ident: key20230705074451_B3
  article-title: Automatic text difficulty classifier: Assisting the selection of adequate reading materials for european portuguese teaching
– volume: 03
  start-page: 1
  issue: 1
  year: 2021
  ident: key20230705074451_B15
  article-title: An analysis of readability metrics on English exam texts
  publication-title: Journal of the Digital Humanities Association of Southern Africa
– start-page: 466
  year: 2012
  ident: key20230705074451_B9
  article-title: An «AI readability» formula for French as a foreign language
– start-page: 10
  volume-title: Curriculum and assessment policy statement: English second additional language grades
  year: 2011
  ident: key20230705074451_B7
– volume-title: Pirls 2021: International results in reading
  year: 2023
  ident: key20230705074451_B12
– volume: 42
  start-page: 457
  issue: 3
  year: 2016
  ident: key20230705074451_B4
  article-title: All mixed up? finding the optimal feature set for general readability prediction and its application to English and dutch
  publication-title: Computational Linguistics
  doi: 10.1162/COLI_a_00255
– volume: 165
  start-page: 97
  issue: 2
  year: 2014
  ident: key20230705074451_B1
  article-title: Computational assessment of text readability: A survey of current and future research
  publication-title: ITL-International Journal of Applied Linguistics
  doi: 10.1075/itl.165.2.01col
– start-page: 10
  volume-title: Curriculum and assessment policy statement: English home language grades
  year: 2011
  ident: key20230705074451_B6
– volume: 35
  start-page: 163
  issue: 2
  year: 2015
  ident: key20230705074451_B10
  article-title: Reading and the orthography of isiZulu
  publication-title: South African Journal of African Languages
  doi: 10.1080/02572117.2015.1113000
– ident: key20230705074451_B8
– volume: 41
  start-page: 76
  issue: 1
  year: 2023
  ident: key20230705074451_B11
  article-title: Merging English Home Language and First Additional Language curricula: Implications for future quality assurance practices
  publication-title: Southern African Linguistics and Applied Language Studies
  doi: 10.2989/16073614.2023.2185984
– volume: 34
  start-page: 3049
  issue: 5
  year: 2018
  ident: key20230705074451_B16
  article-title: Assessment of reading difficulty levels in Russian academic texts: Approaches and metrics
  publication-title: Journal of intelligent and fuzzy systems
  doi: 10.3233/JIFS-169489
– start-page: 10
  volume-title: Curriculum and assessment policy statement: English first additional language grades
  year: 2011
  ident: key20230705074451_B5
– volume: 5
  start-page: 309
  issue: 1
  year: 2014
  ident: key20230705074451_B2
  article-title: Automatic readability classifier for european portuguese
  publication-title: System
– volume-title: Proceedings of the Fourth Workshop on Resources for African Indigenous Languages
  year: 2023
  ident: key20230705074451_B14
– volume: 37
  start-page: 50
  issue: 2
  year: 2021
  ident: key20230705074451_B13
  publication-title: Per Linguam
SSID ssj0002810109
Score 2.2253506
Snippet This article describes a data set of reading comprehension and summary writing texts that were used in final-year high school examinations in South Africa...
SourceID doaj
crossref
SourceType Open Website
Index Database
StartPage 9
SubjectTerms examination texts
final year high school
indigenous languages
linguistic corpus
reading comprehension
summary writing
Title A Data Set of Final Year High School Examination Texts of South African Home and First Additional Language Subjects
URI https://doaj.org/article/5bf060e707a24bfdbece838e216442c4
Volume 9
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1NS8MwGA6ykxdRVJxfvILXsjRNmuw4dWOIenGDeSr5RId04ir4883bVJknL17bkpbnDXk_-zyEXHLqqApKZNpymnHFWTYUMUuxlueBD51gGju69w_ldM5vF2KxIfWFM2GJHjgBNxAm0JJ6SaVm3AQX3-lVoTyLcT5ntmUCjT5vI5latiWjHHs-iW0W_zYdLFfPDgfqfvmfDZr-1p9MdslOFwjCKH3AHtny9T5Zj-BGNxoefQOrACie-wpPcTMCzmNAosyE8afGARaEFGbxcF3js60WHiTZnxpQ_Rx07eISMbyDkXMvqegHd12BEuKRgTWY9QGZT8az62nWySJktmCyyTSlLrdScKti7pTn1jEXpLJKCeEKpqkQXpmcyxD9sfVcFjIEg54qxx5ZWRySXr2q_RGBUpfS6mLoqQ1cmNIgu5wxMWah1peK9cnFN1jVW2K_qGLWgIBWCCjSi_bJFaL4cx_5qtsL0YpVZ8XqLyse_8ciJ2QbxeDbYVpxSnrN-4c_iyFDY87b3fEF3By_Fw
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Data+Set+of+Final+Year+High+School+Examination+Texts+of+South+African+Home+and+First+Additional+Language+Subjects&rft.jtitle=Journal+of+open+humanities+data&rft.au=Johannes+Sibeko&rft.au=Menno+van+Zaanen&rft.date=2023-07-05&rft.pub=Ubiquity+Press&rft.eissn=2059-481X&rft.volume=9&rft.spage=9&rft.epage=9&rft_id=info:doi/10.5334%2Fjohd.108&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_5bf060e707a24bfdbece838e216442c4
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2059-481X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2059-481X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2059-481X&client=summon