Deep Learning Approaches for Detecting Text Generated by Artificial Intelligence

Large language models have been a hot topic for discussion and research for quite a few years, allowing them to infiltrate in many industries, especially education. Their rise in popularity among students was caused by their vast capabilities in giving quick and reliable answers to questions on any...

Full description

Saved in:
Bibliographic Details
Published inStudia Universitatis Babes-Bolyai: Series Informatica Vol. 69; no. 2
Main Author David BIRIS
Format Journal Article
LanguageEnglish
Published Babes-Bolyai University, Cluj-Napoca 02.04.2025
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Large language models have been a hot topic for discussion and research for quite a few years, allowing them to infiltrate in many industries, especially education. Their rise in popularity among students was caused by their vast capabilities in giving quick and reliable answers to questions on any topic. The use of these models for the purpose of generating schoolwork can be seen as a challenge to academic integrity. We investigate the development of AI capable of detecting AI-generated texts and explore with training different types of deep learning models, on a mixed dataset, containing essays, both human written and AI-generated, as well as movie reviews and books. We experimented with LSTM (Long short- term memory) and fine-tuning transformer-based models. We achieve results close to the state of the art, and, in some cases, we surpass a few of these models. For instance, one of our models surpasses a state-of-the-art model on a set of both student written and generated essays, in terms of accuracy by up to 5%, and F1 score by up to 4%, in two different experiments. Furthermore, another model of ours surpasses a state of the art model on a set of essays, but this time only in terms of precision, by only 1%. These results indicate the potential of properly fine-tuned transformer-based models, as well as the importance of a well-prepared dataset. Received by editors: 31 July 2024 2010 Mathematics Subject Classification. 68P15, 94A12 1998 CR Categories and Descriptors. I.2.7 [Artificial Intelligence]: Natural Language Processing – Text Analysis; I.2.6 [Artificial Intelligence]: Learning – Deep Learning; H.3.1 [Information Storage and Retrieval]: Content Analysis and Indexing – Content Analysis and Feature Selection
ISSN:2065-9601
DOI:10.24193/subbi.2024.2.03