문항 및 피드백 생성에서 규칙기반모형과 대규모 언어모형 비교

본 연구의 목적은 규칙 기반 모형과 대규모 언어모형을 사용한 문항 및 피드백 생성 방법을 정확성측면에서 비교 분석하고, 실제 교육 현장에서의 적용 가능성을 탐색하는 것이다. 이를 위해 예시 자료로 2022 개정 교육과정의 중학교 수와 연산 영역에서 내용 타당도를 확보한 7개의 성취 수준에 대해문항과 피드백을 생성하였다. 주요 분석 결과는 다음과 같다. 첫째, 규칙 기반 모형에 따른 문항과 피드백 생성의 정확도는 100%였으나, 대규모 언어모형에 따른 문항과 피드백 생성의 정확도는0%~100%로 다양하게 나타났다. 문항 생성에서 나타...

Full description

Saved in:

Bibliographic Details
Published in	과학교육연구지 Vol. 48; no. 3; pp. 154 - 169
Main Authors	정진민, Jinmin Chung, 김성연, Sungyeun Kim
Format	Journal Article
Language	Korean
Published	경북대학교 과학교육연구소 30.12.2024 과학교육연구소
Subjects	feedback generation item generation large language models number and operation rule-based models 교육학 규칙기반모형 대규모 언어모형 문항 생성 수와 연산 피드백 생성
Online Access	Get full text

Cover

Loading…

More Information
Summary:	본 연구의 목적은 규칙 기반 모형과 대규모 언어모형을 사용한 문항 및 피드백 생성 방법을 정확성측면에서 비교 분석하고, 실제 교육 현장에서의 적용 가능성을 탐색하는 것이다. 이를 위해 예시 자료로 2022 개정 교육과정의 중학교 수와 연산 영역에서 내용 타당도를 확보한 7개의 성취 수준에 대해문항과 피드백을 생성하였다. 주요 분석 결과는 다음과 같다. 첫째, 규칙 기반 모형에 따른 문항과 피드백 생성의 정확도는 100%였으나, 대규모 언어모형에 따른 문항과 피드백 생성의 정확도는0%~100%로 다양하게 나타났다. 문항 생성에서 나타나는 오류로는 선택지와 관련하여 정답이 여러 개인 경우와 답이 없는 경우로 구분되었다. 둘째, 오류가 지속적으로 나타나는 제곱근의 제곱근을 묻는 문항의 경우에는 영어로 프롬프트를 작성하거나 대규모 데이터를 활용한 학습이 필요한 것으로 나타났다. 셋째, 효율적인 맞춤형 평가를 위해 규칙기반으로 문항을 생성함으로써 평가의 정확성을 확보하고, 언어모형을 기반으로 학생 개개인의 정의적 특성, 진로, 학업 수준 등을 반영한 효율적인 피드백을 생성할 수 있는 것으로 나타났다. 마지막으로 본 연구의 제한점과 향후 연구 방향에 대해 논하였다. The purpose of this study is to compare and analyze the accuracy of item and feedback generation methods using rule-based models and large language models, and to explore their applicability in real educational settings. To accomplish this, items and feedback were generated for seven achievement levels validated for content in the middle school “Numbers and Operations” area of the 2022 revised curriculum. The main findings are as follows. First, the accuracy of item and feedback generation using rule-based models was 100%, whereas the accuracy of generation using large language models varied between 0% and 100%. Errors in item generation were categorized into cases where multiple correct answers were associated with choices and cases where no answer was provided. Second, for items repeatedly showing errors, such as those asking for the square root of a square root, it appeared necessary to write prompts in English or to train with large-scale data. Third, it was found that creating items based on rule-based models ensures the accuracy of assessments, while generating efficient feedback that reflects individual affective traits, career paths, and academic levels using large language models. Finally, the study discusses limitations and directions for future research.
Bibliography:	Science Education Research Institute Kyungpook National University KISTI1.1003/JNL.JAKO202409432403429 http://dx.doi.org/10.21796/jse.2024.48.3.154
ISSN:	1225-3944 2733-4074
DOI:	10.21796/jse.2024.48.3.154