Study of Machine-Learning Classifier and Feature Set Selection for Intent Classification of Korean Tweets about Food Safety

In recent years, several studies have proposed making use of the Twitter micro-blogging service to track various trends in online media and discussion. In this study, we specifically examine the use of Twitter to track discussions of food safety in the Korean language. Given the irregularity of keyw...

Full description

Saved in:

Bibliographic Details
Published in	Journal of information science theory and practice Vol. 2; no. 3; pp. 29 - 39
Main Authors	Yeom, Ha-Neul, Hwang, Myunggwon, Hwang, Mi-Nyeong, Jung, Hanmin
Format	Journal Article
Language	English
Published	Daejeon Korean Institute of Science and Technology Information 01.09.2014 Korea Institute of Science and Technology Information 한국과학기술정보연구원
Subjects	Exact sciences and technology Information and communication sciences Information science. Documentation Library and information science. General aspects Machine-learning Feature Sciences and techniques of general use Text Classification Tweets Twitter Use and user studies. Information needs 문헌정보학 Tweets Twitter Text Classification Machine-learning Feature
Online Access	Get full text
ISSN	2287-9099 2287-4577
DOI	10.1633/JISTaP.2014.2.3.3

Cover

More Information
Summary:	In recent years, several studies have proposed making use of the Twitter micro-blogging service to track various trends in online media and discussion. In this study, we specifically examine the use of Twitter to track discussions of food safety in the Korean language. Given the irregularity of keyword use in most tweets, we focus on optimistic machine-learning and feature set selection to classify collected tweets. We build the classifier model using Naive Bayes & Naive Bayes Multinomial, Support Vector Machine, and Decision Tree Algorithms, all of which show good performance. To select an optimum feature set, we construct a basic feature set as a standard for performance comparison, so that further test feature sets can be evaluated. Experiments show that precision and F-measure performance are best when using a Naive Bayes Multinomial classifier model with a test feature set defined by extracting Substantive, Predicate, Modifier, and Interjection parts of speech.
Bibliography:	G704-001608.2014.2.3.001
ISSN:	2287-9099 2287-4577
DOI:	10.1633/JISTaP.2014.2.3.3