Classification of lung cancer subtypes by data mining technique
Lung cancer is the leading cause of cancer-related deaths worldwide. Classification and characterization of cancer treatment strategies are essential in the current medical era. Gene mutations and their altered expressions is the base of cancer development. Analyzing these gene mutations and gene ex...
Saved in:
Published in | Proceedings of The 2014 International Conference on Control, Instrumentation, Energy and Communication (CIEC) pp. 558 - 562 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.01.2014
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Lung cancer is the leading cause of cancer-related deaths worldwide. Classification and characterization of cancer treatment strategies are essential in the current medical era. Gene mutations and their altered expressions is the base of cancer development. Analyzing these gene mutations and gene expression data for the phenotypic classification of lung cancer is proposed in this paper. Genomic and proteomic data sets (Biomarkers) of Non-Small Cell Lung Cancer (NSCLC) and its two major subtypes, Squamous Cell Cancer (SCC) and adenocarcinoma (ADC) were analyzed in this study. The biomarkers included in genomic and proteomic data sets are microRNAs, genes and their proteins. An integrated classification decision tree induction algorithm is applied on these biomarkers of NSCLC cancers for making predictions. Knowledge derived by the proposed algorithm has high classification accuracy with the ability to predict the cancer type. Cross-validation technique is applied that further enhances the classification accuracy of J48 algorithm. Thus our contribution includes the construction of decision tree using J48 weka tool for lung cancer subtypes and predict the lung cancer type for unknown class. Secondly we have compared the outputs obtained using J48 algorithm with improved decision tree (J48). Through the construction of decision tree, totally top ten classification rules are obtained using the apriori algorithm (weka tool) for predicting lung cancer. The average correction classification accuracy is nearly 99.7%, but many of the rules which are of user interest are pruned. The classification rules obtained by improved decision tree are dependent on user decision that helps to derive unlimited rules based on selection of attribute values. The improved decision tree has shown a good improvement over J48 algorithm. The findings are considered as helpful reference rules in diagnosis and drug development of SCC and ADC cancers. The accurate differential diagnosis of lung cancer by the knowledge of biomarkers could reduce the pain of histopathological examination of the patients. |
---|---|
DOI: | 10.1109/CIEC.2014.6959151 |