Deep learning based on biologically interpretable genome representation predicts two types of human adaptation of SARS-CoV-2 variants

Abstract Explosively emerging SARS-CoV-2 variants challenge current nomenclature schemes based on genetic diversity and biological significance. Genomic composition-based machine learning methods have recently performed well in identifying phenotype–genotype relationships. We introduced a framework...

Full description

Saved in:
Bibliographic Details
Published inBriefings in bioinformatics Vol. 23; no. 3
Main Authors Li, Jing, Wu, Ya-Nan, Zhang, Sen, Kang, Xiao-Ping, Jiang, Tao
Format Journal Article
LanguageEnglish
Published England Oxford University Press 13.05.2022
Oxford Publishing Limited (England)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Abstract Explosively emerging SARS-CoV-2 variants challenge current nomenclature schemes based on genetic diversity and biological significance. Genomic composition-based machine learning methods have recently performed well in identifying phenotype–genotype relationships. We introduced a framework involving dinucleotide (DNT) composition representation (DCR) to parse the general human adaptation of RNA viruses and applied a three-dimensional convolutional neural network (3D CNN) analysis to learn the human adaptation of other existing coronaviruses (CoVs) and predict the adaptation of SARS-CoV-2 variants of concern (VOCs). A markedly separable, linear DCR distribution was observed in two major genes—receptor-binding glycoprotein and RNA-dependent RNA polymerase (RdRp)—of six families of single-stranded (ssRNA) viruses. Additionally, there was a general host-specific distribution of both the spike proteins and RdRps of CoVs. The 3D CNN based on spike DCR predicted a dominant type II adaptation of most Beta, Delta and Omicron VOCs, with high transmissibility and low pathogenicity. Type I adaptation with opposite transmissibility and pathogenicity was predicted for SARS-CoV-2 Alpha VOCs (77%) and Kappa variants of interest (58%). The identified adaptive determinants included D1118H and A570D mutations and local DNTs. Thus, the 3D CNN model based on DCR features predicts SARS-CoV-2, a major type II human adaptation and is qualified to predict variant adaptation in real time, facilitating the risk-assessment of emerging SARS-CoV-2 variants and COVID-19 control.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Leading contact.
Jing Li, Ya-Nan Wu and Sen Zhang contributed equally to this work.
ISSN:1467-5463
1477-4054
DOI:10.1093/bib/bbac036