Application of Quaternion in improving the quality of global sequence alignment scores for an ambiguous sequence target in Streptococcus pneumoniae DNA
DNA sequence can be defined as a succession of letters, representing the order of nucleotides within DNA, using a permutation of four DNA base codes including adenine (A), guanine (G), cytosine (C), and thymine (T). The precise code of the sequences is determined using DNA sequencing methods and tec...
Saved in:
Published in | AIP conference proceedings Vol. 1862; no. 1 |
---|---|
Main Authors | , , , |
Format | Journal Article Conference Proceeding |
Language | English |
Published |
Melville
American Institute of Physics
10.07.2017
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | DNA sequence can be defined as a succession of letters, representing the order of nucleotides within DNA, using a permutation of four DNA base codes including adenine (A), guanine (G), cytosine (C), and thymine (T). The precise code of the sequences is determined using DNA sequencing methods and technologies, which have been developed since the 1970s and currently become highly developed, advanced and highly throughput sequencing technologies. So far, DNA sequencing has greatly accelerated biological and medical research and discovery. However, in some cases DNA sequencing could produce any ambiguous and not clear enough sequencing results that make them quite difficult to be determined whether these codes are A, T, G, or C. To solve these problems, in this study we can introduce other representation of DNA codes namely Quaternion
Q = (PA
, PT
, PG
, PC
), where PA
, PT
, PG
, PC
are the probability of A, T, G, C bases that could appear in Q and PA
+ PT
+ PG
+ PC
= 1. Furthermore, using Quaternion representations we are able to construct the improved scoring matrix for global sequence alignment processes, by applying a dot product method. Moreover, this scoring matrix produces better and higher quality of the match and mismatch score between two DNA base codes. In implementation, we applied the Needleman-Wunsch global sequence alignment algorithm using Octave, to analyze our target sequence which contains some ambiguous sequence data. The subject sequences are the DNA sequences of Streptococcus pneumoniae families obtained from the Genebank, meanwhile the target DNA sequence are received from our collaborator database. As the results we found the Quaternion representations improve the quality of the sequence alignment score and we can conclude that DNA sequence target has maximum similarity with Streptococcus pneumoniae. |
---|---|
Bibliography: | ObjectType-Conference Proceeding-1 SourceType-Conference Papers & Proceedings-1 content type line 21 |
ISSN: | 0094-243X 1551-7616 |
DOI: | 10.1063/1.4991226 |