WIP: Comparison of Large Language Models for Applied Mathematics Questions in Engineering Courses

In this Innovative Practice WIP paper, we present a comparison study of large language models (LLMs) to see which would respond best to questions posed in classes. While the ultimate goal of the project is to develop classroom chatbots using LLMs, the first step, presented here, is to evaluate diffe...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings - Frontiers in Education Conference pp. 1 - 5
Main Authors	Merlos, Carlos, Hussain, Faraz, Kar, Swati, Shri, Lavanya, Olugbenle, Olaoluwayimika, Banavar, Mahesh, AlMomani, Abd AlRahman
Format	Conference Proceeding
Language	English
Published	IEEE 13.10.2024
Subjects	Chatbots Costs Differential equations Encoding Hands Large language models Mathematical models performance comparison Signal processing signals and systems Testing Training undergraduate
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In this Innovative Practice WIP paper, we present a comparison study of large language models (LLMs) to see which would respond best to questions posed in classes. While the ultimate goal of the project is to develop classroom chatbots using LLMs, the first step, presented here, is to evaluate different models for multiple topic areas and select a few of them for further development. In this preliminary work, we cover two subject areas: signal processing and differential equations. We take slightly different approaches to each area so we can get a better understanding of the range of capabilities of the LLMs. For signal processing, we use custom open-source LLMs and the free version of OpenAI's ChatGPT 3.5. All models we use here are free. However, significant coding and processing power is required to implement the models in this method. On the other hand, for differential equations, we compare the paid version of ChatGPT 4.0 and compare that with a custom GPT, also from the paid version of ChatGPT. In this case, there is no requirement for coding skills. However, a monthly fee is required. We evaluated the LLMs by testing them on question batteries of different difficulty levels. We found that all the models function well when the questions are simple and drawn directly from the source material. However, as the difficulty level of the questions increases, the "better" models in terms of training and parameters perform better, making a case for better training, but both ways of training come with costs. The results from this evaluation will be presented at the conference. Future work involves further development of these models into more interactive chatbots and their deployment in classrooms for preliminary evaluation.
ISSN:	2377-634X
DOI:	10.1109/FIE61694.2024.10893476