An Efficient Resampling Technique for Financial Statements Fraud Detection: A Comparative Study

Financial statement fraud detection is the process of identifying falsified financial statements. Traditional auditing methods are time-consuming, expensive, and subject to error. Therefore, adopting an efficient and robust machine learning mechanism is important. Unfortunately, the current data sou...

Full description

Saved in:
Bibliographic Details
Published in2023 3rd International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME) pp. 1 - 7
Main Authors Ashtiani, Matin N., Raahemi, Bijan
Format Conference Proceeding
LanguageEnglish
Published IEEE 19.07.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Financial statement fraud detection is the process of identifying falsified financial statements. Traditional auditing methods are time-consuming, expensive, and subject to error. Therefore, adopting an efficient and robust machine learning mechanism is important. Unfortunately, the current data sources suffer from a severe class imbalance. The lack of sufficient fraudulent financial statement records inspires the use of various resampling techniques. This paper a) examines the efficiency of different resampling strategies to detect fraudulent financial statements while employing multi-layer feedforward neural networks, support vector machines, and naïve Bayes machine learning models, and b) investigates the superiority of using Raw Accounting Variables (RAVs) over financial ratios for financial statement fraud detection. A benchmark dataset of numerical financial variables (RAVs and financial ratios) is used as features for model evaluation. The fraud labels correspond to the Accounting and Auditing Enforcement Releases by the U.S. Securities and Exchange Commission (SEC). We analyze the performance of the models on 28 RAVs and 14 financial ratios suggested by accounting experts. Using the area under the receiver operating characteristic curve (AUC) as the performance metric, the synthetic minority oversampling technique (SMOTE), along with a three-layer feedforward neural network (AUC: 0.863), greatly outperformed the RUSBoost (AUC: 0.717) model.
DOI:10.1109/ICECCME57830.2023.10253185