Hybrid Method Based on Information Gain and Support Vector Machine for Gene Selection in Cancer Classification

It remains a great challenge to achieve sufficient cancer classification accuracy with theentire set of genes, due to the high dimensions, small sample size, and big noise of gene expressiondata. We thus proposed a hybrid gene selection method, Information Gain-Support Vector Machine(IG-SVM) in this...

Full description

Saved in:
Bibliographic Details
Published in基因组蛋白质组与生物信息学报:英文版 Vol. 15; no. 6; pp. 389 - 395
Main Author Lingyun Gao Mingquan Ye Xiaojie Lu Daobin Huang
Format Journal Article
LanguageEnglish
Published 2017
Online AccessGet full text

Cover

Loading…
More Information
Summary:It remains a great challenge to achieve sufficient cancer classification accuracy with theentire set of genes, due to the high dimensions, small sample size, and big noise of gene expressiondata. We thus proposed a hybrid gene selection method, Information Gain-Support Vector Machine(IG-SVM) in this study. IG was initially employed to filter irrelevant and redundant genes. Then,further removal of redundant genes was performed using SVM to eliminate the noise in the datasetsmore effectively. Finally, the informative genes selected by IG-SVM served as the input for theLIBSVM classifier. Compared to other related algorithms, IG-SVM showed the highest classificationaccuracy and superior performance as evaluated using five cancer gene expression datasetsbased on a few selected genes. As an example, IG-SVM achieved a classification accuracy of90.32% for colon cancer, which is difficult to be accurately classified, only based on three genesincluding CSRP1, MYL9, and GUCA2B.
Bibliography:Lingyun Gao;Mingquan Ye;Xiaojie Lu;Daobin Huang
11-4926/Q
Gene selection;Cancer classification;Information gain;Support vector machine;Small sample size with high dimension
It remains a great challenge to achieve sufficient cancer classification accuracy with theentire set of genes, due to the high dimensions, small sample size, and big noise of gene expressiondata. We thus proposed a hybrid gene selection method, Information Gain-Support Vector Machine(IG-SVM) in this study. IG was initially employed to filter irrelevant and redundant genes. Then,further removal of redundant genes was performed using SVM to eliminate the noise in the datasetsmore effectively. Finally, the informative genes selected by IG-SVM served as the input for theLIBSVM classifier. Compared to other related algorithms, IG-SVM showed the highest classificationaccuracy and superior performance as evaluated using five cancer gene expression datasetsbased on a few selected genes. As an example, IG-SVM achieved a classification accuracy of90.32% for colon cancer, which is difficult to be accurately classified, only based on three genesincluding CSRP1, MYL9, and GUCA2B.
ISSN:1672-0229