Topic modeling for conversations for mental health helplines with utterance embedding

Conversations with topics that are locally contextual often produces incoherent topic modeling results using standard methods. Splitting a conversation into its individual utterances makes it possible to avoid this problem. However, with increased data sparsity, different methods need to be consider...

Full description

Saved in:

Bibliographic Details
Published in	Telematics and Informatics Reports Vol. 13; p. 100126
Main Authors	Salmi, Salim, van der Mei, Rob, Mérelle, Saskia, Bhulai, Sandjai
Format	Journal Article
Language	English
Published	Elsevier B.V 01.03.2024 Elsevier
Subjects	Bert Conversations Mental health Sentence embedding Topic modeling Topic modeling Bert Conversations Sentence embedding Mental health
Online Access	Get full text
ISSN	2772-5030 2772-5030
DOI	10.1016/j.teler.2024.100126

Cover

Loading…

More Information
Summary:	Conversations with topics that are locally contextual often produces incoherent topic modeling results using standard methods. Splitting a conversation into its individual utterances makes it possible to avoid this problem. However, with increased data sparsity, different methods need to be considered. Baseline bag-of-word topic modeling methods for regular and short-text, as well as topic modeling methods using transformer-based sentence embeddings were implemented. These models were evaluated on topic coherence and word embedding similarity. Each method was trained using single utterances, segments of the conversation, and on the full conversation. The results showed that utterance-level and segment-level data combined with sentence embedding methods performs better compared to other non-sentence embedding methods or conversation-level data. Among the sentence embedding methods, clustering using HDBScan showed the best performance. We suspect that ignoring noisy utterances is the reason for better topic coherence and a relatively large improvement in topic word similarity.
ISSN:	2772-5030 2772-5030
DOI:	10.1016/j.teler.2024.100126