A statistical approach to coronavirus classification based on nucleotide distributions
The objective of this study is to analyze specific genomes, namely the RNA of coronaviruses, based on the parameters obtained from the distributions of nucleotide sequences in their RNA. The viral RNA was subjected to distribution based on nucleotide sequences obtained by changing one nucleotide bas...
Saved in:
Published in | Mathematical Modeling and Computing Vol. 11; no. 4; pp. 987 - 994 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
2024
|
Online Access | Get full text |
ISSN | 2312-9794 2415-3788 |
DOI | 10.23939/mmc2024.04.987 |
Cover
Loading…
Summary: | The objective of this study is to analyze specific genomes, namely the RNA of coronaviruses, based on the parameters obtained from the distributions of nucleotide sequences in their RNA. The viral RNA was subjected to distribution based on nucleotide sequences obtained by changing one nucleotide base (adenine) into a "whitespace", with empty sequences denoted as "x". Statistical spectra were constructed in such cases. They exhibited three distinct peaks that were consistent across the studied species. Parameters based on the rank–frequency distributions of the obtained nucleotide sequences, sequence lengths, and some other statistical parameters were calculated. Based on these parameters, the principal components were built, which were the basis for the grouping of the studied viruses. The most relevant parameters formed the model of a naїve Bayes classifier, which analyzes the probability of the virus belonging to a certain group of viruses in the model. |
---|---|
ISSN: | 2312-9794 2415-3788 |
DOI: | 10.23939/mmc2024.04.987 |