A Dataset for Measuring Reading Levels In India At Scale

One out of four children in India are leaving grade eight without basic reading skills. Measuring the reading levels in a vast country like India poses significant hurdles. Recent advances in machine learning opens up the possibility of automating this task. However, the datasets of children's...

Full description

Saved in:
Bibliographic Details
Published inProceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) pp. 9210 - 9214
Main Authors Agarwal, Dolly, Gupchup, Jayant, Baghel, Nishant
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.05.2020
Subjects
Online AccessGet full text

Cover

Loading…
Abstract One out of four children in India are leaving grade eight without basic reading skills. Measuring the reading levels in a vast country like India poses significant hurdles. Recent advances in machine learning opens up the possibility of automating this task. However, the datasets of children's speech are not only rare but are primarily in English. To solve this assessment problem and advance deep learning research in regional Indian languages, we present the ASER dataset of children in the age group of 6-14. The dataset consists of 5,301 subjects generating 81,330 labeled audio clips in Hindi, Marathi and English. These labels represent expert opinions on the child's ability to read at a specified level. Using this dataset, we built a simple ASR-based classifier. Early results indicate that we can achieve a prediction accuracy of 86% for the English language. Considering the ASER survey spans half a million subjects, this dataset can grow to those scales.
AbstractList One out of four children in India are leaving grade eight without basic reading skills. Measuring the reading levels in a vast country like India poses significant hurdles. Recent advances in machine learning opens up the possibility of automating this task. However, the datasets of children's speech are not only rare but are primarily in English. To solve this assessment problem and advance deep learning research in regional Indian languages, we present the ASER dataset of children in the age group of 6-14. The dataset consists of 5,301 subjects generating 81,330 labeled audio clips in Hindi, Marathi and English. These labels represent expert opinions on the child's ability to read at a specified level. Using this dataset, we built a simple ASR-based classifier. Early results indicate that we can achieve a prediction accuracy of 86% for the English language. Considering the ASER survey spans half a million subjects, this dataset can grow to those scales.
Author Agarwal, Dolly
Baghel, Nishant
Gupchup, Jayant
Author_xml – sequence: 1
  givenname: Dolly
  surname: Agarwal
  fullname: Agarwal, Dolly
  organization: Pratham Education Foundation
– sequence: 2
  givenname: Jayant
  surname: Gupchup
  fullname: Gupchup, Jayant
  organization: Pratham Volunteer
– sequence: 3
  givenname: Nishant
  surname: Baghel
  fullname: Baghel, Nishant
  organization: Pratham Education Foundation
BookMark eNotj1FLwzAUhaMouM39Al_yB1pvkqY3eSxT56CiWAXfxp25kUrtpKmC_96KgwMf5-EczpmLk37fsxBSQa4U-MvNqmqahwIQy1yDhtyDNcbBkVh6dMqCh7I0yh6LmTboM-Xh5UzMU3oHAIeFmwlXySsaKfEo436Qd0zpa2j7N_nIFP5Y8zd3SW76SaElWY2yeaWOz8VppC7x8sCFeL65flrdZvX9etpVZ60GM2bW-gCRnYngDJY-MqOOGCwG3AWrKFJhPbJV7HdUYhGB0EyuUFMOglmIi__elpm3n0P7QcPP9nDU_ALoG0ir
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ICASSP40776.2020.9053380
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISBN 9781509066315
1509066314
EISSN 2379-190X
EndPage 9214
ExternalDocumentID 9053380
Genre orig-research
GroupedDBID 23M
29P
6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
AAWTH
ABLEC
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
IPLJI
M43
OCL
RIE
RIL
RIO
RNS
ID FETCH-LOGICAL-i203t-559d0fe83f083769fee72f7d57d7bd51afa4597e51e9ba674f0a7351e41d0f0d3
IEDL.DBID RIE
IngestDate Wed Aug 27 02:46:36 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i203t-559d0fe83f083769fee72f7d57d7bd51afa4597e51e9ba674f0a7351e41d0f0d3
PageCount 5
ParticipantIDs ieee_primary_9053380
PublicationCentury 2000
PublicationDate 2020-May
PublicationDateYYYYMMDD 2020-05-01
PublicationDate_xml – month: 05
  year: 2020
  text: 2020-May
PublicationDecade 2020
PublicationTitle Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998)
PublicationTitleAbbrev ICASSP
PublicationYear 2020
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0008748
Score 2.113065
Snippet One out of four children in India are leaving grade eight without basic reading skills. Measuring the reading levels in a vast country like India poses...
SourceID ieee
SourceType Publisher
StartPage 9210
SubjectTerms Acoustics
Assessment
Conferences
Deep learning
EdTech
Machine Learning
Random forests
Reading Skills
Signal processing
Speech Dataset
Speech processing
Task analysis
Title A Dataset for Measuring Reading Levels In India At Scale
URI https://ieeexplore.ieee.org/document/9053380
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NT4MwGG7mTnrxYzN-pwePwgoU2h6X6bIZZ5bMJbstZX2bGBNmtFz89b4FNj_iwYQDlDRQXujTpzzPW0KueZ6D1SoJIGUm4NqKQJlVFBgbG8RHkxjrzcmTx2w05_eLdNEiN1svDABU4jMI_W71L9-sV6WfKuspbxyVSNB3kLjVXq1trysFlxulDlO98aA_m025T1aDJDBmYVP3xyIqFYYM98lkc_VaOvISli4PVx-_EjP-9_YOSPfLrUenWxw6JC0ojsjet0SDHSL79FY7BCxHcZBKJ9XEIJ6hjYaePnjt0DsdF7jhG0P7js4wetAl8-Hd02AUNGsmBM8xS1yABMEwCzKxOLYSmbIAIrbCpMKI3KSRtpojh4A0ApXrTHDLtEjwiEdYj5nkmLSLdQEnhMpYa8AO0WRK8lwrHRtQzAv7AD_ySJ6Sjn8Gy9c6Lcayaf7Z38XnZNfHodYKXpC2eyvhEvHc5VdVID8BGSWffQ
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NT8IwFG8IHtSLH2D8tgePDrqtW9sjQQkoIyRAwo109DUxJsPouPjX-7oN_IgHkx22Lk3XvbW_97rf75WQW56mYLUKPYiY8bi2wlNm6XvGBgbx0YTGOnFyMor7M_44j-Y1crfVwgBAQT6Dljst_uWb1XLtlsrayglHJQboO4j7kV-qtbbzrhRcbrg6TLUH3c5kMuYuXQ2GgQFrVbV_bKNSoEjvgCSb9kvyyEtrnaet5cev1Iz_fcBD0vzS69HxFomOSA2yY7L_LdVgg8gOvdc5QlZO0U2lSbE0iHdoxaKnQ8ceeqeDDA_8ZmgnpxO0HzTJrPcw7fa9atcE7zlgYe5hiGCYBRla9K5ErCyACKwwkTAiNZGvreYYRUDkg0p1LLhlWoR4xX2sx0x4QurZKoNTQmWgNeCUaGIleaqVDgwo5qh9gMPcl2ek4d7B4rVMjLGoun_-d_EN2e1Pk-FiOBg9XZA9Z5OSOXhJ6vnbGq4Q3fP0ujDqJ6DiosY
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+...+IEEE+International+Conference+on+Acoustics%2C+Speech+and+Signal+Processing+%281998%29&rft.atitle=A+Dataset+for+Measuring+Reading+Levels+In+India+At+Scale&rft.au=Agarwal%2C+Dolly&rft.au=Gupchup%2C+Jayant&rft.au=Baghel%2C+Nishant&rft.date=2020-05-01&rft.pub=IEEE&rft.eissn=2379-190X&rft.spage=9210&rft.epage=9214&rft_id=info:doi/10.1109%2FICASSP40776.2020.9053380&rft.externalDocID=9053380