A Data Set of Final Year High School Examination Texts of South African Home and First Additional Language Subjects
This article describes a data set of reading comprehension and summary writing texts that were used in final-year high school examinations in South Africa between 2008 and 2020. It contains texts for eleven official South African languages. PDF versions of the texts stem from South Africa’s Departme...
Saved in:
Published in | Journal of open humanities data Vol. 9; p. 9 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Ubiquity Press
05.07.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | This article describes a data set of reading comprehension and summary writing texts that were used in final-year high school examinations in South Africa between 2008 and 2020. It contains texts for eleven official South African languages. PDF versions of the texts stem from South Africa’s Department of Basic Education’s online public access repository. Plain text is extracted from the PDFs and the texts are tokenized. The data set contains 429 full-text files with 929 manually extracted comprehension and summary writing texts. The data is useful for studies investigating, e.g., linguistic properties, text readability, text properties, and linguistic complexity in any of the eleven languages. Furthermore, both intra-language and inter-language comparisons or investigations can be made. |
---|---|
AbstractList | This article describes a data set of reading comprehension and summary writing texts that were used in final-year high school examinations in South Africa between 2008 and 2020. It contains texts for eleven official South African languages. PDF versions of the texts stem from South Africa’s Department of Basic Education’s online public access repository. Plain text is extracted from the PDFs and the texts are tokenized. The data set contains 429 full-text files with 929 manually extracted comprehension and summary writing texts. The data is useful for studies investigating, e.g., linguistic properties, text readability, text properties, and linguistic complexity in any of the eleven languages. Furthermore, both intra-language and inter-language comparisons or investigations can be made. |
Author | van Zaanen, Menno Sibeko, Johannes |
Author_xml | – sequence: 1 givenname: Johannes orcidid: 0000-0003-3586-7491 surname: Sibeko fullname: Sibeko, Johannes – sequence: 2 givenname: Menno orcidid: 0000-0003-1841-2444 surname: van Zaanen fullname: van Zaanen, Menno |
BookMark | eNpN0EtLQkEUB_AhCjIT-gizbKOdeXnHpZimILTQoFaXc-ehV_ROzIxg375rRrQ6T36L_x25bkLjCHlgMFBCyKdd2NoBA31FOhzUqC81e7_-19-SXko7AOCaAYNRh6QxfcaMdOUyDZ7O6gb39MNhpPN6s6Ursw1hT6cnPLSXXIeGrt0pp_PvKhzzlo59rA02dB4OjmJjWyKmTMfW1uf3VltisznixtHVsdo5k9M9ufG4T673W7vkbTZdT-b95evLYjJe9o3gRe4jgGWmUNJoLkeMGcutL7TRWikrOIJSTldMFh64ME4WovC-AjHkTGmuh6JLFhfXBtyVn7E-YPwqA9blzyLETYkx12bvSlV5GIIroEAuK28rZ5wW2nE2lJIb2VqPF8vEkFJ0_s9jUJ6zL8_Zt4MW32wjd6M |
Cites_doi | 10.1162/COLI_a_00255 10.1075/itl.165.2.01col 10.1080/02572117.2015.1113000 10.2989/16073614.2023.2185984 10.3233/JIFS-169489 |
ContentType | Journal Article |
DBID | AAYXX CITATION DOA |
DOI | 10.5334/johd.108 |
DatabaseName | CrossRef DOAJ Directory of Open Access Journals |
DatabaseTitle | CrossRef |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals (WRLC) url: https://www.doaj.org/ sourceTypes: Open Website |
DeliveryMethod | fulltext_linktorsrc |
EISSN | 2059-481X |
EndPage | 9 |
ExternalDocumentID | oai_doaj_org_article_5bf060e707a24bfdbece838e216442c4 10_5334_johd_108 |
GroupedDBID | .0O AAFWJ AAPRH AAYXX ACCQO AFPKN ALMA_UNASSIGNED_HOLDINGS CITATION GROUPED_DOAJ H13 HMHOC IAO IFM ITC M~E |
ID | FETCH-LOGICAL-c327t-a00d1c754c824911cd2df78c8855d32a055e8b147f023ce4737ffb03621582863 |
IEDL.DBID | DOA |
ISSN | 2059-481X |
IngestDate | Wed Aug 27 01:28:49 EDT 2025 Tue Jul 01 03:15:01 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c327t-a00d1c754c824911cd2df78c8855d32a055e8b147f023ce4737ffb03621582863 |
ORCID | 0000-0003-3586-7491 0000-0003-1841-2444 |
OpenAccessLink | https://doaj.org/article/5bf060e707a24bfdbece838e216442c4 |
PageCount | 1 |
ParticipantIDs | doaj_primary_oai_doaj_org_article_5bf060e707a24bfdbece838e216442c4 crossref_primary_10_5334_johd_108 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 20230705 |
PublicationDateYYYYMMDD | 2023-07-05 |
PublicationDate_xml | – month: 07 year: 2023 text: 20230705 day: 05 |
PublicationDecade | 2020 |
PublicationTitle | Journal of open humanities data |
PublicationYear | 2023 |
Publisher | Ubiquity Press |
Publisher_xml | – name: Ubiquity Press |
References | (key20230705074451_B13) 2021; 37 Department of Basic Education (key20230705074451_B6) 2011 key20230705074451_B8 (key20230705074451_B15) 2021; 03 (key20230705074451_B9) 2012 (key20230705074451_B1) 2014; 165 Department of Basic Education (key20230705074451_B5) 2011 (key20230705074451_B3) 2015 (key20230705074451_B16) 2018; 34 (key20230705074451_B14) 2023 (key20230705074451_B4) 2016; 42 (key20230705074451_B11) 2023; 41 (key20230705074451_B12) 2023 (key20230705074451_B2) 2014; 5 (key20230705074451_B10) 2015; 35 Department of Basic Education (key20230705074451_B7) 2011 |
References_xml | – start-page: 36 year: 2015 ident: key20230705074451_B3 article-title: Automatic text difficulty classifier: Assisting the selection of adequate reading materials for european portuguese teaching – volume: 03 start-page: 1 issue: 1 year: 2021 ident: key20230705074451_B15 article-title: An analysis of readability metrics on English exam texts publication-title: Journal of the Digital Humanities Association of Southern Africa – start-page: 466 year: 2012 ident: key20230705074451_B9 article-title: An «AI readability» formula for French as a foreign language – start-page: 10 volume-title: Curriculum and assessment policy statement: English second additional language grades year: 2011 ident: key20230705074451_B7 – volume-title: Pirls 2021: International results in reading year: 2023 ident: key20230705074451_B12 – volume: 42 start-page: 457 issue: 3 year: 2016 ident: key20230705074451_B4 article-title: All mixed up? finding the optimal feature set for general readability prediction and its application to English and dutch publication-title: Computational Linguistics doi: 10.1162/COLI_a_00255 – volume: 165 start-page: 97 issue: 2 year: 2014 ident: key20230705074451_B1 article-title: Computational assessment of text readability: A survey of current and future research publication-title: ITL-International Journal of Applied Linguistics doi: 10.1075/itl.165.2.01col – start-page: 10 volume-title: Curriculum and assessment policy statement: English home language grades year: 2011 ident: key20230705074451_B6 – volume: 35 start-page: 163 issue: 2 year: 2015 ident: key20230705074451_B10 article-title: Reading and the orthography of isiZulu publication-title: South African Journal of African Languages doi: 10.1080/02572117.2015.1113000 – ident: key20230705074451_B8 – volume: 41 start-page: 76 issue: 1 year: 2023 ident: key20230705074451_B11 article-title: Merging English Home Language and First Additional Language curricula: Implications for future quality assurance practices publication-title: Southern African Linguistics and Applied Language Studies doi: 10.2989/16073614.2023.2185984 – volume: 34 start-page: 3049 issue: 5 year: 2018 ident: key20230705074451_B16 article-title: Assessment of reading difficulty levels in Russian academic texts: Approaches and metrics publication-title: Journal of intelligent and fuzzy systems doi: 10.3233/JIFS-169489 – start-page: 10 volume-title: Curriculum and assessment policy statement: English first additional language grades year: 2011 ident: key20230705074451_B5 – volume: 5 start-page: 309 issue: 1 year: 2014 ident: key20230705074451_B2 article-title: Automatic readability classifier for european portuguese publication-title: System – volume-title: Proceedings of the Fourth Workshop on Resources for African Indigenous Languages year: 2023 ident: key20230705074451_B14 – volume: 37 start-page: 50 issue: 2 year: 2021 ident: key20230705074451_B13 publication-title: Per Linguam |
SSID | ssj0002810109 |
Score | 2.2253506 |
Snippet | This article describes a data set of reading comprehension and summary writing texts that were used in final-year high school examinations in South Africa... |
SourceID | doaj crossref |
SourceType | Open Website Index Database |
StartPage | 9 |
SubjectTerms | examination texts final year high school indigenous languages linguistic corpus reading comprehension summary writing |
Title | A Data Set of Final Year High School Examination Texts of South African Home and First Additional Language Subjects |
URI | https://doaj.org/article/5bf060e707a24bfdbece838e216442c4 |
Volume | 9 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1NS8MwGA6ykxdRVJxfvILXsjRNmuw4dWOIenGDeSr5RId04ir4883bVJknL17bkpbnDXk_-zyEXHLqqApKZNpymnHFWTYUMUuxlueBD51gGju69w_ldM5vF2KxIfWFM2GJHjgBNxAm0JJ6SaVm3AQX3-lVoTyLcT5ntmUCjT5vI5latiWjHHs-iW0W_zYdLFfPDgfqfvmfDZr-1p9MdslOFwjCKH3AHtny9T5Zj-BGNxoefQOrACie-wpPcTMCzmNAosyE8afGARaEFGbxcF3js60WHiTZnxpQ_Rx07eISMbyDkXMvqegHd12BEuKRgTWY9QGZT8az62nWySJktmCyyTSlLrdScKti7pTn1jEXpLJKCeEKpqkQXpmcyxD9sfVcFjIEg54qxx5ZWRySXr2q_RGBUpfS6mLoqQ1cmNIgu5wxMWah1peK9cnFN1jVW2K_qGLWgIBWCCjSi_bJFaL4cx_5qtsL0YpVZ8XqLyse_8ciJ2QbxeDbYVpxSnrN-4c_iyFDY87b3fEF3By_Fw |
linkProvider | Directory of Open Access Journals |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Data+Set+of+Final+Year+High+School+Examination+Texts+of+South+African+Home+and+First+Additional+Language+Subjects&rft.jtitle=Journal+of+open+humanities+data&rft.au=Johannes+Sibeko&rft.au=Menno+van+Zaanen&rft.date=2023-07-05&rft.pub=Ubiquity+Press&rft.eissn=2059-481X&rft.volume=9&rft.spage=9&rft.epage=9&rft_id=info:doi/10.5334%2Fjohd.108&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_5bf060e707a24bfdbece838e216442c4 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2059-481X&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2059-481X&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2059-481X&client=summon |