Efficient Social Network Multilingual Classification using Character, POS n-grams and Dynamic Normalization

In this paper we describe a dynamic normalization process applied to social network multilingual documents (Facebook and Twitter) to improve the performance of the Author profiling task for short texts. After the normalization process, \(n\)-grams of characters and n-grams of POS tags are obtained t...

Full description

Saved in:
Bibliographic Details
Published inarXiv.org
Main Authors González-Gallardo, Carlos-Emiliano, Torres-Moreno, Juan-Manuel, Azucena Montes Rendón, Sierra, Gerardo
Format Paper
LanguageEnglish
Published Ithaca Cornell University Library, arXiv.org 21.02.2017
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In this paper we describe a dynamic normalization process applied to social network multilingual documents (Facebook and Twitter) to improve the performance of the Author profiling task for short texts. After the normalization process, \(n\)-grams of characters and n-grams of POS tags are obtained to extract all the possible stylistic information encoded in the documents (emoticons, character flooding, capital letters, references to other users, hyperlinks, hashtags, etc.). Experiments with SVM showed up to 90% of performance.
ISSN:2331-8422