Deep learning in the marking of medical student short answer question examinations : Student perceptions and pilot accuracy assessment

Introduction: Machine learning has previously been applied to text analysis. There is limited data regarding the acceptability or accuracy of such applications in medical education. This project examined medical student opinion regarding computer-based marking and evaluated the accuracy of deep lear...

Full description

Saved in:
Bibliographic Details
Published inFocus on health professional education Vol. 24; no. 1; pp. 38 - 48
Main Authors L Hollis-Sando, C Pugh, K Franke, T Zerner, Y Tan, G Carneiro, A van den Hengel, I Symonds, P Duggan, S Bacchi
Format Journal Article
LanguageEnglish
Published Adelaide Australian and New Zealand Association for Health Professional Educators 01.03.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Introduction: Machine learning has previously been applied to text analysis. There is limited data regarding the acceptability or accuracy of such applications in medical education. This project examined medical student opinion regarding computer-based marking and evaluated the accuracy of deep learning (DL), a subtype of machine learning, in the scoring of medical short answer questions (SAQs). Methods: Fourth- and fifth-year medical students undertook an anonymised online examination. Prior to the examination, students completed a survey gauging their opinion on computer-based marking. Questions were marked by humans, and then a DL analysis was conducted using convolutional neural networks. In the DL analysis, following preprocessing, data were split into a training dataset (on which models were developed using 10-fold cross-validation) and a test dataset (on which performance analysis was conducted). Results: One hundred and eighty-one students completed the examination (participation rate 59.0%). While students expressed concern regarding the accuracy of computer-based marking, the majority of students agreed that computer marking would be more objective than human marking (67.0%) and reported they would not object to computer-based marking (55.5%). Regarding automated marking of SAQs, for 1-mark questions, there were consistently high classification accuracies (mean accuracy 0.98). For more complex 2-mark and 3-mark SAQs, in which multiclass classification was required, accuracy was lower (mean 0.65 and 0.59, respectively). Conclusions: Medical students may be supportive of computer-based marking due to its objectivity. DL has the potential to provide accurate marking of written questions, however further research into DL marking of medical examinations is required.
Bibliography:Refereed article. Includes bibliographical references.
Focus on Health Professional Education: A Multi-Professional Journal, Vol. 24, No. 1, Mar 2023, 38-48
Informit, Melbourne (Vic)
ISSN:1442-1100
2204-7662
2204-7662
DOI:10.11157/fohpe.v24i1.531