Feature selection with ensemble learning for prostate cancer diagnosis from microarray gene expression

Cancer diagnosis using machine learning algorithms is one of the main topics of research in computer-based medical science. Prostate cancer is considered one of the reasons that are leading to deaths worldwide. Data analysis of gene expression from microarray using machine learning and soft computin...

Full description

Saved in:

Bibliographic Details
Published in	Health informatics journal Vol. 27; no. 1; p. 1460458221989402
Main Authors	Gumaei, Abdu, Sammouda, Rachid, Al-Rakhami, Mabrook, AlSalman, Hussain, El-Zaart, Ali
Format	Journal Article
Language	English
Published	London, England SAGE Publications 01.01.2021 SAGE PUBLICATIONS, INC
Subjects	Algorithms Feature selection Gene expression Machine learning Medical diagnosis Prostate cancer ensemble learning prostate cancer microarray data random committee 10-fold cross-validation machine learning feature selection
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Cancer diagnosis using machine learning algorithms is one of the main topics of research in computer-based medical science. Prostate cancer is considered one of the reasons that are leading to deaths worldwide. Data analysis of gene expression from microarray using machine learning and soft computing algorithms is a useful tool for detecting prostate cancer in medical diagnosis. Even though traditional machine learning methods have been successfully applied for detecting prostate cancer, the large number of attributes with a small sample size of microarray data is still a challenge that limits their ability for effective medical diagnosis. Selecting a subset of relevant features from all features and choosing an appropriate machine learning method can exploit the information of microarray data to improve the accuracy rate of detection. In this paper, we propose to use a correlation feature selection (CFS) method with random committee (RC) ensemble learning to detect prostate cancer from microarray data of gene expression. A set of experiments are conducted on a public benchmark dataset using 10-fold cross-validation technique to evaluate the proposed approach. The experimental results revealed that the proposed approach attains 95.098% accuracy, which is higher than related work methods on the same dataset.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1460-4582 1741-2811
DOI:	10.1177/1460458221989402