Automating Lexicon Generation: A Comprehensive Review of Alternative Approaches

Lexicon-based approaches to Document Classification are widely used, but the manual construction of lexicons can be time-consuming and resource-intensive. In this paper, we propose methods for automating the generation of lexicons later used for Document Classification. We explored diverse methods f...

Full description

Saved in:
Bibliographic Details
Published inProcedia computer science Vol. 225; pp. 1142 - 1150
Main Authors Ghali, Julien Pierre Edmond, Inuzuka, Nobuhiro, Shima, Kosuke, Moriyama, Koichi, Mutoh, Atsuko
Format Journal Article
LanguageEnglish
Published Elsevier B.V 2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Lexicon-based approaches to Document Classification are widely used, but the manual construction of lexicons can be time-consuming and resource-intensive. In this paper, we propose methods for automating the generation of lexicons later used for Document Classification. We explored diverse methods for generating lexicons, including semantic matches, frequency-based approaches, machine learning algorithms, and large language model techniques. We, later, used these lexicons to classify documents based on their content. By comparing our different lexicons results on a same task, based on criteria such as scalability and the F1 score, we determine optimized use-case for those methods. We show that our automated approaches are effective and efficient, producing accurate classifications with minimal human intervention. Some approaches have the potential to streamline the document classification process, reducing the time and resources required for manual lexicon generation, it also gives optimized use-case for the different methods. Thereafter, we discussed the obtained results.
ISSN:1877-0509
1877-0509
DOI:10.1016/j.procs.2023.10.102