HASKER: An efficient algorithm for string kernels. Application to polarity classification in various languages
String kernels have successfully been used for various NLP tasks, ranging from text categorization by topic to native language identification. In this paper, we present a simple and efficient algorithm for computing various spectrum string kernels. When comparing two strings, we store the p-grams in...
Saved in:
Published in | Procedia computer science Vol. 112; pp. 1755 - 1763 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Elsevier B.V
2017
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | String kernels have successfully been used for various NLP tasks, ranging from text categorization by topic to native language identification. In this paper, we present a simple and efficient algorithm for computing various spectrum string kernels. When comparing two strings, we store the p-grams in the first string into a hash table, and then we apply a hash table lookup for the p-grams that occur in the second string. In terms of time, we show that our algorithm can outperform a state-of-the-art tool for computing string similarity. In terms of accuracy, we show that our approach can reach state-of-the-art performance for polarity classification in various languages. Our efficient implementation is provided online for free at http://string-kernels.herokuapp.com. |
---|---|
ISSN: | 1877-0509 1877-0509 |
DOI: | 10.1016/j.procs.2017.08.207 |