Topic Modeling Techniques for Text Mining Over a Large-Scale Scientific and Biomedical Text Corpus

Topic models are efficient in extracting central themes from large-scale document collection and it is an active research area. The state-of-the-art techniques like Latent Dirichlet Allocation, Correlated Topic Model (CTM), Hierarchical Dirichlet Process (HDP), Dirichlet Multinomial Regression (DMR)...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of ambient computing and intelligence Vol. 13; no. 1; pp. 1 - 18
Main Authors Chauhan, Ritu, Acharjya, Debi Prasanna, Avasthi, Sandhya
Format Journal Article
LanguageEnglish
Published Hershey IGI Global 2022
Subjects
Online AccessGet full text
ISSN1941-6237
1941-6245
DOI10.4018/IJACI.293137

Cover

More Information
Summary:Topic models are efficient in extracting central themes from large-scale document collection and it is an active research area. The state-of-the-art techniques like Latent Dirichlet Allocation, Correlated Topic Model (CTM), Hierarchical Dirichlet Process (HDP), Dirichlet Multinomial Regression (DMR) and Hierarchical Pachinko Allocation (HPA) model is considered for comparison. . The abstracts of articles were collected between different periods from PUBMED library by keywords adolescence substance use and depression. A lot of research has happened in this area and thousands of articles are available on PubMed in this area. This collection is huge and so extracting information is very time-consuming. To fit the topic models this extracted text data is used and fitted models were evaluated using both likelihood and non-likelihood measures. The topic models are compared using the evaluation parameters like log-likelihood and perplexity. To evaluate the quality of topics topic coherence measures has been used.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1941-6237
1941-6245
DOI:10.4018/IJACI.293137