Double-weight LDA extracting keywords for financial fraud detection system
The impact of financial fraud is widespread, from everyday life to the financial industry, and it reduces industry confidence and destabilizes the country’s economy. Therefore, it is important to develop an intelligent financial fraud detection system for early warning and prevention. This study pro...
Saved in:
Published in | Multimedia tools and applications Vol. 83; no. 17; pp. 50757 - 50781 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
New York
Springer US
01.05.2024
Springer Nature B.V |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The impact of financial fraud is widespread, from everyday life to the financial industry, and it reduces industry confidence and destabilizes the country’s economy. Therefore, it is important to develop an intelligent financial fraud detection system for early warning and prevention. This study proposes a double-weight latent Dirichlet allocation (DW-LDA) to extract the keywords from financial fraud data, and then we use five intelligent classifiers to build an intelligent text fraud detection model. In addition, the financial fraud dataset usually contains more non-fraud cases than fraud cases, which is an imbalanced dataset; hence, this study uses a synthesized minority oversampling technique (SMOTE) and random undersampling to handle imbalanced datasets. In verification, this study collected the Enron email and MD&A datasets to compare the performances of the related topic models and weighted LDA (TFIDF+LDA and PMI + LDA) with the proposed DW-LDA after SMOTE handling. In evaluating model performance, we use accuracy, recall, precision, F-score, and AUC as evaluation metrics, and the results show that the proposed DW-LDA (TFIDF+PMI + LDA) has a better performance than the listing topic models. For visual information representation, we use visual graphs to show the important results, such as the word cloud of the fraudulent email and keywords. The research results and the built intelligent text fraud detection model can be provided to investors and stakeholders for reference. |
---|---|
ISSN: | 1573-7721 1380-7501 1573-7721 |
DOI: | 10.1007/s11042-023-17334-1 |