A coverage criterion for spaced seeds and its applications to support vector machine string kernels and k-mer distances
Spaced seeds have been recently shown to not only detect more alignments, but also to give a more accurate measure of phylogenetic distances, and to provide a lower misclassification rate when used with Support Vector Machines (SVMs). We confirm by independent experiments these two results, and prop...
Saved in:
Published in | Journal of computational biology Vol. 21; no. 12; p. 947 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
United States
01.12.2014
|
Subjects | |
Online Access | Get more information |
Cover
Loading…
Summary: | Spaced seeds have been recently shown to not only detect more alignments, but also to give a more accurate measure of phylogenetic distances, and to provide a lower misclassification rate when used with Support Vector Machines (SVMs). We confirm by independent experiments these two results, and propose in this article to use a coverage criterion to measure the seed efficiency in both cases in order to design better seed patterns. We show first how this coverage criterion can be directly measured by a full automaton-based approach. We then illustrate how this criterion performs when compared with two other criteria frequently used, namely the single-hit and multiple-hit criteria, through correlation coefficients with the correct classification/the true distance. At the end, for alignment-free distances, we propose an extension by adopting the coverage criterion, show how it performs, and indicate how it can be efficiently computed. |
---|---|
ISSN: | 1557-8666 |
DOI: | 10.1089/cmb.2014.0173 |