Comparative Study of Different Machine Learning Models for Customer Churn Analysis Using SMOTE and Feature Variation Along With Customer Segmentation

Customer churn is a major issue faced by the companies in both the online and offline markets, which adversely affects profit and revenue. Recently, Machine Learning (ML) is being used to analyze and predict customer churns. In this research the problem of churn prediction is studied with special fo...

Full description

Saved in:
Bibliographic Details
Published in2023 International Conference on Modeling, Simulation & Intelligent Computing (MoSICom) pp. 637 - 642
Main Authors Thankam, Mary Shana, El Gayar, Neamat
Format Conference Proceeding
LanguageEnglish
Published IEEE 07.12.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Customer churn is a major issue faced by the companies in both the online and offline markets, which adversely affects profit and revenue. Recently, Machine Learning (ML) is being used to analyze and predict customer churns. In this research the problem of churn prediction is studied with special focus on feature selection and unbalanced data sets. Also, churn analysis has mainly dealt with prediction and not methods to retain customers. In our study, we use a customer dataset from a US telecom company. We compare several classifiers for churn prediction including logistic regression, decision trees, SVM, random forest, k-NN and XgBoost. Besides, methods to retain the customers are discussed. The importance of feature selection is highlighted in this paper and a detailed experimental study of model performance on balanced and unbalanced datasets are explored. After comparing the F1 scores, AUC scores and precision-recall curve, it is seen that XgBoost outperformed all the other algorithms. On the other hand, retaining customers requires the careful study of their behavioral patterns. Customer segmentation is an effective way used by the marketing teams to identify the different groups of customers. In this paper, k-means, agglomerative clustering, gaussian mixture (GM) and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) are used for clustering the customers into segments. We evaluate the clustering results using silhouette analysis.
DOI:10.1109/MoSICom59118.2023.10458848