A deep learning model based on sparse auto-encoder for prioritizing cancer-related genes and drug target combinations

Prioritization of cancer-related genes from gene expression profiles and proteomic data is vital to improve the targeted therapies research. Although computational approaches have been complementing high-throughput biological experiments on the understanding of human diseases, it still remains a big...

Full description

Saved in:
Bibliographic Details
Published inCarcinogenesis (New York) Vol. 40; no. 5; pp. 624 - 632
Main Authors Chang, Ji-Wei, Ding, Yuduan, Tahir Ul Qamar, Muhammad, Shen, Yin, Gao, Junxiang, Chen, Ling-Ling
Format Journal Article
LanguageEnglish
Published England 04.07.2019
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Prioritization of cancer-related genes from gene expression profiles and proteomic data is vital to improve the targeted therapies research. Although computational approaches have been complementing high-throughput biological experiments on the understanding of human diseases, it still remains a big challenge to accurately discover cancer-related proteins/genes via automatic learning from large-scale protein/gene expression data and protein-protein interaction data. Most of the existing methods are based on network construction combined with gene expression profiles, which ignore the diversity between normal samples and disease cell lines. In this study, we introduced a deep learning model based on a sparse auto-encoder to learn the specific characteristics of protein interactions in cancer cell lines integrated with protein expression data. The model showed learning ability to identify cancer-related proteins/genes from the input of different protein expression profiles by extracting the characteristics of protein interaction information, which could also predict cancer-related protein combinations. Comparing with other reported methods including differential expression and network-based methods, our model got the highest area under the curve value (>0.8) in predicting cancer-related genes. Our study prioritized ~500 high-confidence cancer-related genes; among these genes, 211 already known cancer drug targets were found, which supported the accuracy of our method. The above results indicated that the proposed auto-encoder model could computationally prioritize candidate proteins/genes involved in cancer and improve the targeted therapies research.
ISSN:0143-3334
1460-2180
DOI:10.1093/carcin/bgz044