A statistical approach to coronavirus classification based on nucleotide distributions

The objective of this study is to analyze specific genomes, namely the RNA of coronaviruses, based on the parameters obtained from the distributions of nucleotide sequences in their RNA. The viral RNA was subjected to distribution based on nucleotide sequences obtained by changing one nucleotide bas...

Full description

Saved in:
Bibliographic Details
Published inMathematical Modeling and Computing Vol. 11; no. 4; pp. 987 - 994
Main Authors Husiev, M., Rovenchak, A.
Format Journal Article
LanguageEnglish
Published 2024
Online AccessGet full text
ISSN2312-9794
2415-3788
DOI10.23939/mmc2024.04.987

Cover

Loading…
More Information
Summary:The objective of this study is to analyze specific genomes, namely the RNA of coronaviruses, based on the parameters obtained from the distributions of nucleotide sequences in their RNA. The viral RNA was subjected to distribution based on nucleotide sequences obtained by changing one nucleotide base (adenine) into a "whitespace", with empty sequences denoted as "x". Statistical spectra were constructed in such cases. They exhibited three distinct peaks that were consistent across the studied species. Parameters based on the rank–frequency distributions of the obtained nucleotide sequences, sequence lengths, and some other statistical parameters were calculated. Based on these parameters, the principal components were built, which were the basis for the grouping of the studied viruses. The most relevant parameters formed the model of a naїve Bayes classifier, which analyzes the probability of the virus belonging to a certain group of viruses in the model.
ISSN:2312-9794
2415-3788
DOI:10.23939/mmc2024.04.987