Improving mathematics assessment readability: Do large language models help?

Background Readability metrics provide us with an objective and efficient way to assess the quality of educational texts. We can use the readability measures for finding assessment items that are difficult to read for a given grade level. Hard‐to‐read math word problems can put some students at a di...

Full description

Saved in:
Bibliographic Details
Published inJournal of computer assisted learning Vol. 39; no. 3; pp. 804 - 822
Main Authors Patel, Nirmal, Nagpal, Pooja, Shah, Tirth, Sharma, Aditya, Malvi, Shrey, Lomas, Derek
Format Journal Article
LanguageEnglish
Published Chichester, UK John Wiley & Sons, Inc 01.06.2023
Wiley
Wiley Subscription Services, Inc
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Background Readability metrics provide us with an objective and efficient way to assess the quality of educational texts. We can use the readability measures for finding assessment items that are difficult to read for a given grade level. Hard‐to‐read math word problems can put some students at a disadvantage if they are behind in their literacy learning. Despite their math abilities, these students can perform poorly on difficult‐to‐read word problems because of their poor reading skills. Less readable math tests can create equity issues for students who are relatively new to the language of assessment. Less readable test items can also affect the assessment's construct validity by partially measuring reading comprehension. Objectives This study shows how large language models help us improve the readability of math assessment items. Methods We analysed 250 test items from grades 3 to 5 of EngageNY, an open‐source curriculum. We used the GPT‐3 AI system to simplify the text of these math word problems. We used text prompts and the few‐shot learning method for the simplification task. Results and Conclusions On average, GPT‐3 AI produced output passages that showed improvements in readability metrics, but the outputs had a large amount of noise and were often unrelated to the input. We used thresholds over text similarity metrics and changes in readability measures to filter out the noise. We found meaningful simplifications that can be given to item authors as suggestions for improvement. Takeaways GPT‐3 AI is capable of simplifying hard‐to‐read math word problems. The model generates noisy simplifications using text prompts or few‐shot learning methods. The noise can be filtered using text similarity and readability measures. The meaningful simplifications AI produces are sound but not ready to be used as a direct replacement for the original items. To improve test quality, simplifications can be suggested to item authors at the time of digital question authoring. Lay Description What is Known About the Subject Difficult to read math assessment items cause measurement and equity issues Readability of math word problems negatively correlated with outcomes What Our Paper Adds GPT‐3 AI system is capable to simplify math word problems Prompt‐based and few‐shot learning‐based approaches are able to create meaningful simplifications, but with a very low accuracy rate Text similarity and readability measures can be used to filter out noisy outputs and discover interesting simplifications What are the Implications for Practitioners GPT‐3 AI is capable of generating relevant math world problem simplifications Simplified text can be used as suggestions to question authors during the digital item authoring process
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0266-4909
1365-2729
DOI:10.1111/jcal.12776