Hypergraph-based Gene Ontology Embedding for Disease Gene Prediction

Disease gene identification has provided valuable insights into illuminating the molecular mechanisms underlying complex diseases. And it has been shown that novel drugs with genetically supported targets were more likely to be successful in clinical trials. In recent years, multiple graph machine l...

Full description

Saved in:
Bibliographic Details
Published in2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) pp. 2424 - 2430
Main Authors Wang, Tao, Xu, Hengbo, Zhang, Ranye, Xiao, Yifu, Peng, Jiajie, Shang, Xuequn
Format Conference Proceeding
LanguageEnglish
Published IEEE 06.12.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Disease gene identification has provided valuable insights into illuminating the molecular mechanisms underlying complex diseases. And it has been shown that novel drugs with genetically supported targets were more likely to be successful in clinical trials. In recent years, multiple graph machine learning-based methods for this purpose have been proposed. However, those methods were mainly based on various well-established biological molecular networks, while seldomly considering the curated biological annotations of genes. To fill this gap, we aim to integrate the gene ontology annotations (GOA), including the biological process (BP), the cellular component (CC), and the molecular function (MF), into the process of disease gene prediction. Our method treated the GOA as a hypergraph and used the hypergraph-based embedding technique to extract the deep features underlying gene annotations. Besides, we also extracted gene features from the protein-protein interaction (PPI) network using graph representation learning methods. The convolutional neural network (CNN) framework was followed to fuse the features extracted from the two networks and make the final prediction. Experiments on a range of diseases have demonstrated the accuracy and robustness of our method. The average area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC) reached 0.85 and 0.79, respectively. Besides, the hypergraph-based gene ontology embedding can be generalized to other bioinformatics applications.
DOI:10.1109/BIBM55620.2022.9995140