Survey on Nucleotide Encoding Techniques and SVM Kernel Design for Human Splice Site Prediction
Splice site prediction in DNA sequence is a basic search problem for finding exon/intron and intron/exon boundaries. Removing introns and then joining the exons together forms the mRNA sequence. These sequences are the input of the translation process. It is a necessary step in the central dogma of...
Saved in:
Published in | Interdisciplinary bio central Vol. 4; no. 4; pp. 14.1 - 14.6 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | Korean |
Published |
2012
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Splice site prediction in DNA sequence is a basic search problem for finding exon/intron and intron/exon boundaries. Removing introns and then joining the exons together forms the mRNA sequence. These sequences are the input of the translation process. It is a necessary step in the central dogma of molecular biology. The main task of splice site prediction is to find out the exact GT and AG ended sequences. Then it identifies the true and false GT and AG ended sequences among those candidate sequences. In this paper, we survey research works on splice site prediction based on support vector machine (SVM). The basic difference between these research works is nucleotide encoding technique and SVM kernel selection. Some methods encode the DNA sequence in a sparse way whereas others encode in a probabilistic manner. The encoded sequences serve as input of SVM. The task of SVM is to classify them using its learning model. The accuracy of classification largely depends on the proper kernel selection for sequence data as well as a selection of kernel parameter. We observe each encoding technique and classify them according to their similarity. Then we discuss about kernel and their parameter selection. Our survey paper provides a basic understanding of encoding approaches and proper kernel selection of SVM for splice site prediction. |
---|---|
Bibliography: | KISTI1.1003/JNL.JAKO201210635653679 |
ISSN: | 2005-8543 |