Classification of high dimensional biomedical data based on feature selection using redundant removal

High dimensional biomedical data contain tens of thousands of features, accurate and effective identification of the core features in these data can be used to assist diagnose related diseases. However, there are often a large number of irrelevant or redundant features in biomedical data, which seri...

Full description

Saved in:

Bibliographic Details
Published in	PloS one Vol. 14; no. 4; p. e0214406
Main Authors	Zhang, Bingtao, Cao, Peng
Format	Journal Article
Language	English
Published	United States Public Library of Science 09.04.2019 Public Library of Science (PLoS)
Subjects	Algorithms Artificial intelligence Automatic classification Biogeography Biology and Life Sciences Biomedical data Brain cancer Classification Computer and Information Sciences Disease Early Detection of Cancer Earth Sciences Ecology and Environmental Sciences Engineering Forecasting Gene expression Geospatial data Glioblastoma - diagnosis Glioblastoma - pathology Humans Learning algorithms Linear programming Machine Learning Medical diagnosis Medical informatics Medical research Medicine and Health Sciences Models, Theoretical Novels Physical Sciences Product acceptance Redundancy Research and Analysis Methods Research Design China
Online Access	Get full text

Cover

Loading…

More Information
Summary:	High dimensional biomedical data contain tens of thousands of features, accurate and effective identification of the core features in these data can be used to assist diagnose related diseases. However, there are often a large number of irrelevant or redundant features in biomedical data, which seriously affect subsequent classification accuracy and machine learning efficiency. To solve this problem, a novel filter feature selection algorithm based on redundant removal (FSBRR) is proposed to classify high dimensional biomedical data in this paper. First of all, two redundant criteria are determined by vertical relevance (the relationship between feature and class attribute) and horizontal relevance (the relationship between feature and feature). Secondly, to quantify redundant criteria, an approximate redundancy feature framework based on mutual information (MI) is defined to remove redundant and irrelevant features. To evaluate the effectiveness of our proposed algorithm, controlled trials based on typical feature selection algorithm are conducted using three different classifiers, and the experimental results indicate that the FSBRR algorithm can effectively reduce the feature dimension and improve the classification accuracy. In addition, an experiment of small sample dataset is designed and conducted in the section of discussion and analysis to clarify the specific implementation process of FSBRR algorithm more clearly.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Competing Interests: The authors have declared that no competing interests exist.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0214406