Establishment of model of adaboost classifier and evaluation of harmful mutations in non-coding regions of liver cancer cells
Objective To establish a model of adaboost classifier, evaluate the possibility of disease related mutations in non-coding regions of liver cancer cells, and identify harmful mutations in non-coding regions. Methods A total of 13 108 disease related mutations in non-coding regions were selected from...
Saved in:
Published in | Shanghai jiao tong da xue xue bao. Yi xue ban Vol. 35; no. 6; p. 819 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | Chinese English |
Published |
28.06.2015
|
Online Access | Get full text |
ISSN | 1674-8115 |
Cover
Summary: | Objective To establish a model of adaboost classifier, evaluate the possibility of disease related mutations in non-coding regions of liver cancer cells, and identify harmful mutations in non-coding regions. Methods A total of 13 108 disease related mutations in non-coding regions were selected from HGMD database and used as subjects and neutral SNPs were used as controls. Combined with regulatory factors of non-coding regions, such as conserved regions, evolutionary RNA conservative structures, high-expressed genes, DNAseI hypersensitive sites, transcription factor binding sites, histone modification, and early replicated genes, the model of adaboost classifier was established. The value of these factors for predicting harmful mutations in non-coding regions was analyzed. The receiver operating characteristic (ROC) curve was plotted and the area under the ROC curve (AUC sub(ROC)) was calculated. The genome-wide association study (GWAS) and ClinVar disease-associated variants database were used to verify the model. Results Factors sorted by the importance for identifying disease related mutations were conserved regions, early replicated genes, untranslated Regions (UTR), promoters, high-expressed regions, H3K36me3, and conserved TFBSs. The ROC curve was established by using the prediction probability of adaboost classifier and the AUC sub(ROC) was 0.90. The average scores of GWAS and ClinVar disease associated variants were significantly higher than that of neutral SNPs (P<0.05). Conclusion The adaboost classifier is helpful for evaluating the possibility of harmful mutations in non-coding regions of liver cancer cells and is an accurate prediction tool. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 1674-8115 |