Study on a prediction of P2P network loan default based on the machine learning LightGBM and XGboost algorithms according to different high dimensional data cleaning

•This study uses the new machine learning algorithms to predict the default risk.•Two different ways to clean too many variables and missing values data.•Comparisons are made between these two equally sophisticated algorithms.•Put forward relevant policy recommendations for global P2P platforms. Big...

Full description

Saved in:
Bibliographic Details
Published inElectronic commerce research and applications Vol. 31; pp. 24 - 39
Main Authors Ma, Xiaojun, Sha, Jinglan, Wang, Dehua, Yu, Yuanbo, Yang, Qian, Niu, Xueqi
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.09.2018
Subjects
Online AccessGet full text
ISSN1567-4223
1873-7846
DOI10.1016/j.elerap.2018.08.002

Cover

Loading…
More Information
Summary:•This study uses the new machine learning algorithms to predict the default risk.•Two different ways to clean too many variables and missing values data.•Comparisons are made between these two equally sophisticated algorithms.•Put forward relevant policy recommendations for global P2P platforms. Big data and the Internet financial sector tremendously developed in the 21st century. The national emphasis on this field has also gradually improved. Peer-to-peer (P2P) is an innovative mode of borrowing that is a powerful complement to the traditional financial industry. The projected default rate on credit is an absolute prerequisite for guaranteeing the proper operation of related financial projects or platforms. In this paper, we use ‘multi-observation’ and ‘multi-dimensional’ data cleaning method and apply the modern machine learning algorithms LightGBM in Asia at the end of 2016 and XGboost, which are based on real P2P transaction data from Lending club. The default risk of loans in the platform is strongly and innovatively predicted. And the results of different methods are compared. Furthermore, we observe that the LightGBM algorithm based on multiple observational data set classification prediction results is the best. The average performance rate of the historical transaction data of the Lending Club platform rose by 1.28 percentage points, which reduced loan defaults by approximately $117 million. Finally, with respect to the influencing factors of the default rate, suggested developments for the Lending club and other P2P platforms are provided as is the suggested direction of other countries’ development in this field.
ISSN:1567-4223
1873-7846
DOI:10.1016/j.elerap.2018.08.002