Deep Learning Approaches for Detecting Text Generated by Artificial Intelligence
Large language models have been a hot topic for discussion and research for quite a few years, allowing them to infiltrate in many industries, especially education. Their rise in popularity among students was caused by their vast capabilities in giving quick and reliable answers to questions on any...
Saved in:
Published in | Studia Universitatis Babes-Bolyai: Series Informatica Vol. 69; no. 2 |
---|---|
Main Author | |
Format | Journal Article |
Language | English |
Published |
Babes-Bolyai University, Cluj-Napoca
02.04.2025
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Large language models have been a hot topic for discussion and research for quite a few years, allowing them to infiltrate in many industries, especially education. Their rise in popularity among students was caused by their vast capabilities in giving quick and reliable answers to questions on any topic. The use of these models for the purpose of generating schoolwork can be seen as a challenge to academic integrity. We investigate the development of AI capable of detecting AI-generated texts and explore with training different types of deep learning models, on a mixed dataset, containing essays, both human written and AI-generated, as well as movie reviews and books. We experimented with LSTM (Long short- term memory) and fine-tuning transformer-based models. We achieve results close to the state of the art, and, in some cases, we surpass a few of these models. For instance, one of our models surpasses a state-of-the-art model on a set of both student written and generated essays, in terms of accuracy by up to 5%, and F1 score by up to 4%, in two different experiments. Furthermore, another model of ours surpasses a state of the art model on a set of essays, but this time only in terms of precision, by only 1%. These results indicate the potential of properly fine-tuned transformer-based models, as well as the importance of a well-prepared dataset. Received by editors: 31 July 2024 2010 Mathematics Subject Classification. 68P15, 94A12 1998 CR Categories and Descriptors. I.2.7 [Artificial Intelligence]: Natural Language Processing – Text Analysis; I.2.6 [Artificial Intelligence]: Learning – Deep Learning; H.3.1 [Information Storage and Retrieval]: Content Analysis and Indexing – Content Analysis and Feature Selection |
---|---|
ISSN: | 2065-9601 |
DOI: | 10.24193/subbi.2024.2.03 |