Establishment of model of adaboost classifier and evaluation of harmful mutations in non-coding regions of liver cancer cells

Objective To establish a model of adaboost classifier, evaluate the possibility of disease related mutations in non-coding regions of liver cancer cells, and identify harmful mutations in non-coding regions. Methods A total of 13 108 disease related mutations in non-coding regions were selected from...

Full description

Saved in:
Bibliographic Details
Published inShanghai jiao tong da xue xue bao. Yi xue ban Vol. 35; no. 6; p. 819
Main Authors Xu, Li-ping, Li, Jia, Fang, Lin
Format Journal Article
LanguageChinese
English
Published 28.06.2015
Online AccessGet full text
ISSN1674-8115

Cover

More Information
Summary:Objective To establish a model of adaboost classifier, evaluate the possibility of disease related mutations in non-coding regions of liver cancer cells, and identify harmful mutations in non-coding regions. Methods A total of 13 108 disease related mutations in non-coding regions were selected from HGMD database and used as subjects and neutral SNPs were used as controls. Combined with regulatory factors of non-coding regions, such as conserved regions, evolutionary RNA conservative structures, high-expressed genes, DNAseI hypersensitive sites, transcription factor binding sites, histone modification, and early replicated genes, the model of adaboost classifier was established. The value of these factors for predicting harmful mutations in non-coding regions was analyzed. The receiver operating characteristic (ROC) curve was plotted and the area under the ROC curve (AUC sub(ROC)) was calculated. The genome-wide association study (GWAS) and ClinVar disease-associated variants database were used to verify the model. Results Factors sorted by the importance for identifying disease related mutations were conserved regions, early replicated genes, untranslated Regions (UTR), promoters, high-expressed regions, H3K36me3, and conserved TFBSs. The ROC curve was established by using the prediction probability of adaboost classifier and the AUC sub(ROC) was 0.90. The average scores of GWAS and ClinVar disease associated variants were significantly higher than that of neutral SNPs (P<0.05). Conclusion The adaboost classifier is helpful for evaluating the possibility of harmful mutations in non-coding regions of liver cancer cells and is an accurate prediction tool.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1674-8115