A Bayesian Possibilistic C-Means clustering approach for cervical cancer screening

•Propose a Bayesian Possibilistic C-Means clustering for missing attribute estimation.•The proposed BPCM can discover the underlying patterns in data.•Propose a fuzzy ensemble learning scheme for cervical cancer screening. Recently, a lot of attention has been given to the treatment of cervical canc...

Full description

Saved in:
Bibliographic Details
Published inInformation sciences Vol. 501; pp. 495 - 510
Main Authors Li, Fang-Qi, Wang, Shi-Lin, Liu, Gong-Shen
Format Journal Article
LanguageEnglish
Published Elsevier Inc 01.10.2019
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•Propose a Bayesian Possibilistic C-Means clustering for missing attribute estimation.•The proposed BPCM can discover the underlying patterns in data.•Propose a fuzzy ensemble learning scheme for cervical cancer screening. Recently, a lot of attention has been given to the treatment of cervical cancer due to its high lethality and morbidity. Early screening of this disease is of vital importance. In this paper, we propose an automatic cervical cancer screening algorithm that analyzes the related risk factors to provide preliminary diagnostic information for medical practitioners. In cervical cancer screening, a number of risk factors are considered to be highly private or sensitive, and many patients elect not to provide the corresponding information. Such severe amount of missing attributes leads to great difficulties for many automatic screening algorithms. To solve this problem, a Bayesian Possibilistic C-means (BPCM in short) clustering algorithm is proposed to discover the representative patterns from the complete data and to estimate the missing values of a specific sample using its closest representative pattern. After the data completion step, a two-stage fuzzy ensemble learning scheme is proposed to derive the final screening result. In the first stage, the bootstrap aggregation (bagging in short) procedure is adopted to sample the entire class-imbalanced dataset into a number of class-balanced subsets. In the second stage, a number of weak classifiers are trained on each subset and a fuzzy logic based approach is designed to analyze the classification results of the weak classifiers and to obtain the final classification result. Experiments have been conducted on a dataset containing 858 patients. From the experiment results, it can be observed that the proposed BPCM can effectively discover the underlying patterns and is reliable in estimating the missing attribute compared with the traditional approaches. Moreover, by applying the proposed fuzzy ensemble learning scheme, the final classification results on the completed data by BPCM are promising (an accuracy of 76% or a positive sensitivity of 79%) under the severe missing-attribute scenario (only 6% samples with complete data).
ISSN:0020-0255
1872-6291
DOI:10.1016/j.ins.2019.05.089