Survey of Automatic Labeling Methods for Topic Models

Topic models are often used in modeling unstructured corpora and discrete data to extract the latent topic. As topics are generally expressed in the form of word lists, it is usually difficult for users to understand the meanings of topics, especially when users lack knowledge in the subject area. A...

Full description

Saved in:
Bibliographic Details
Published inJisuanji kexue yu tansuo Vol. 17; no. 12; pp. 2861 - 2879
Main Author HE Dongbin, TAO Sha, ZHU Yanhong, REN Yanzhao, CHU Yunxia
Format Journal Article
LanguageChinese
Published Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press 01.12.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Topic models are often used in modeling unstructured corpora and discrete data to extract the latent topic. As topics are generally expressed in the form of word lists, it is usually difficult for users to understand the meanings of topics, especially when users lack knowledge in the subject area. Although manually labeling topics can generate more explanatory and easily understandable topic labels, the cost is too high for the method to be feasible. Therefore, research on automatic labeling of topic discovered provides solutions to the problem. Firstly, the currently most popular technique, latent Dirichlet allocation (LDA), is elaborated and analyzed. According to the three different representations of topic labels, based on phrases, abstracts, and pictures, the topic labeling methods are classified into three types. Then, centered on improving the interpretability of topics, with different types of generated topic labels utilized, the relevant research in recent years is sorted out, analyzed, and summarize
ISSN:1673-9418
DOI:10.3778/j.issn.1673-9418.2303083