A coverage criterion for spaced seeds and its applications to support vector machine string kernels and k-mer distances

Spaced seeds have been recently shown to not only detect more alignments, but also to give a more accurate measure of phylogenetic distances, and to provide a lower misclassification rate when used with Support Vector Machines (SVMs). We confirm by independent experiments these two results, and prop...

Full description

Saved in:
Bibliographic Details
Published inJournal of computational biology Vol. 21; no. 12; p. 947
Main Authors Noé, Laurent, Martin, Donald E K
Format Journal Article
LanguageEnglish
Published United States 01.12.2014
Subjects
Online AccessGet more information

Cover

Loading…
More Information
Summary:Spaced seeds have been recently shown to not only detect more alignments, but also to give a more accurate measure of phylogenetic distances, and to provide a lower misclassification rate when used with Support Vector Machines (SVMs). We confirm by independent experiments these two results, and propose in this article to use a coverage criterion to measure the seed efficiency in both cases in order to design better seed patterns. We show first how this coverage criterion can be directly measured by a full automaton-based approach. We then illustrate how this criterion performs when compared with two other criteria frequently used, namely the single-hit and multiple-hit criteria, through correlation coefficients with the correct classification/the true distance. At the end, for alignment-free distances, we propose an extension by adopting the coverage criterion, show how it performs, and indicate how it can be efficiently computed.
ISSN:1557-8666
DOI:10.1089/cmb.2014.0173