Study of Machine-Learning Classifier and Feature Set Selection for Intent Classification of Korean Tweets about Food Safety
In recent years, several studies have proposed making use of the Twitter micro-blogging service to track various trends in online media and discussion. In this study, we specifically examine the use of Twitter to track discussions of food safety in the Korean language. Given the irregularity of keyw...
Saved in:
Published in | Journal of information science theory and practice Vol. 2; no. 3; pp. 29 - 39 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Daejeon
Korean Institute of Science and Technology Information
01.09.2014
Korea Institute of Science and Technology Information 한국과학기술정보연구원 |
Subjects | |
Online Access | Get full text |
ISSN | 2287-9099 2287-4577 |
DOI | 10.1633/JISTaP.2014.2.3.3 |
Cover
Summary: | In recent years, several studies have proposed making use of the Twitter micro-blogging service to track various trends in online media and discussion. In this study, we specifically examine the use of Twitter to track discussions of food safety in the Korean language. Given the irregularity of keyword use in most tweets, we focus on optimistic machine-learning and feature set selection to classify collected tweets. We build the classifier model using Naive Bayes & Naive Bayes Multinomial, Support Vector Machine, and Decision Tree Algorithms, all of which show good performance. To select an optimum feature set, we construct a basic feature set as a standard for performance comparison, so that further test feature sets can be evaluated. Experiments show that precision and F-measure performance are best when using a Naive Bayes Multinomial classifier model with a test feature set defined by extracting Substantive, Predicate, Modifier, and Interjection parts of speech. |
---|---|
Bibliography: | G704-001608.2014.2.3.001 |
ISSN: | 2287-9099 2287-4577 |
DOI: | 10.1633/JISTaP.2014.2.3.3 |