Using Zero-shot Prompting in the Automatic Creation and Expansion of Topic Taxonomies for Tagging Retail Banking Transactions

This work presents an unsupervised method for automatically constructing and expanding topic taxonomies using instruction-based fine-tuned LLMs (Large Language Models). We apply topic modeling and keyword extraction techniques to create initial topic taxonomies and LLMs to post-process the resulting...

Full description

Saved in:
Bibliographic Details
Published inarXiv.org
Main Authors de S Moraes, Daniel, Santos, Pedro T C, da Costa, Polyana B, Pinto, Matheus A S, Ivan de J P Pinto, Álvaro M G da Veiga, Colcher, Sergio, Busson, Antonio J G, Rocha, Rafael H, Gaio, Rennan, Miceli, Rafael, Tourinho, Gabriela, Rabaioli, Marcos, Santos, Leandro, Marques, Fellipe, Favaro, David
Format Paper Journal Article
LanguageEnglish
Published Ithaca Cornell University Library, arXiv.org 11.02.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This work presents an unsupervised method for automatically constructing and expanding topic taxonomies using instruction-based fine-tuned LLMs (Large Language Models). We apply topic modeling and keyword extraction techniques to create initial topic taxonomies and LLMs to post-process the resulting terms and create a hierarchy. To expand an existing taxonomy with new terms, we use zero-shot prompting to find out where to add new nodes, which, to our knowledge, is the first work to present such an approach to taxonomy tasks. We use the resulting taxonomies to assign tags that characterize merchants from a retail bank dataset. To evaluate our work, we asked 12 volunteers to answer a two-part form in which we first assessed the quality of the taxonomies created and then the tags assigned to merchants based on that taxonomy. The evaluation revealed a coherence rate exceeding 90% for the chosen taxonomies. The taxonomies' expansion with LLMs also showed exciting results for parent node prediction, with an f1-score above 70% in our taxonomies.
ISSN:2331-8422
DOI:10.48550/arxiv.2401.06790