HASKER: An efficient algorithm for string kernels. Application to polarity classification in various languages

String kernels have successfully been used for various NLP tasks, ranging from text categorization by topic to native language identification. In this paper, we present a simple and efficient algorithm for computing various spectrum string kernels. When comparing two strings, we store the p-grams in...

Full description

Saved in:

Bibliographic Details
Published in	Procedia computer science Vol. 112; pp. 1755 - 1763
Main Authors	Popescu, Marius, Grozea, Cristian, Tudor Ionescu, Radu
Format	Journal Article
Language	English
Published	Elsevier B.V 2017
Subjects	blended specturm kernel intersection kernel kernel methods open-source code opining mining polarity classification sentiment analysis similarity-based learning string kernels string kernels tool string kernels similarity-based learning opining mining sentiment analysis open-source code intersection kernel string kernels tool kernel methods polarity classification blended specturm kernel
Online Access	Get full text

Cover

Loading…

More Information
Summary:	String kernels have successfully been used for various NLP tasks, ranging from text categorization by topic to native language identification. In this paper, we present a simple and efficient algorithm for computing various spectrum string kernels. When comparing two strings, we store the p-grams in the first string into a hash table, and then we apply a hash table lookup for the p-grams that occur in the second string. In terms of time, we show that our algorithm can outperform a state-of-the-art tool for computing string similarity. In terms of accuracy, we show that our approach can reach state-of-the-art performance for polarity classification in various languages. Our efficient implementation is provided online for free at http://string-kernels.herokuapp.com.
ISSN:	1877-0509 1877-0509
DOI:	10.1016/j.procs.2017.08.207