Double-weight LDA extracting keywords for financial fraud detection system

The impact of financial fraud is widespread, from everyday life to the financial industry, and it reduces industry confidence and destabilizes the country’s economy. Therefore, it is important to develop an intelligent financial fraud detection system for early warning and prevention. This study pro...

Full description

Saved in:
Bibliographic Details
Published inMultimedia tools and applications Vol. 83; no. 17; pp. 50757 - 50781
Main Authors Cheng, Ching-Hsue, Cai, Wen-Hong
Format Journal Article
LanguageEnglish
Published New York Springer US 01.05.2024
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The impact of financial fraud is widespread, from everyday life to the financial industry, and it reduces industry confidence and destabilizes the country’s economy. Therefore, it is important to develop an intelligent financial fraud detection system for early warning and prevention. This study proposes a double-weight latent Dirichlet allocation (DW-LDA) to extract the keywords from financial fraud data, and then we use five intelligent classifiers to build an intelligent text fraud detection model. In addition, the financial fraud dataset usually contains more non-fraud cases than fraud cases, which is an imbalanced dataset; hence, this study uses a synthesized minority oversampling technique (SMOTE) and random undersampling to handle imbalanced datasets. In verification, this study collected the Enron email and MD&A datasets to compare the performances of the related topic models and weighted LDA (TFIDF+LDA and PMI + LDA) with the proposed DW-LDA after SMOTE handling. In evaluating model performance, we use accuracy, recall, precision, F-score, and AUC as evaluation metrics, and the results show that the proposed DW-LDA (TFIDF+PMI + LDA) has a better performance than the listing topic models. For visual information representation, we use visual graphs to show the important results, such as the word cloud of the fraudulent email and keywords. The research results and the built intelligent text fraud detection model can be provided to investors and stakeholders for reference.
ISSN:1573-7721
1380-7501
1573-7721
DOI:10.1007/s11042-023-17334-1