Refinement by Filtering Translation Candidates and Similarity Based Approach to Expand Emotion Tagged Corpus

Researches on emotion estimation from text mostly use machine learning method. Because machine learning requires a large amount of example corpora, how to acquire high quality training data has been discussed as one of its major problems. The existing language resources include emotion corpora; howe...

Full description

Saved in:
Bibliographic Details
Published inKnowledge Discovery, Knowledge Engineering and Knowledge Management Vol. 631; pp. 260 - 280
Main Authors Matsumoto, Kazuyuki, Ren, Fuji, Yoshida, Minoru, Kita, Kenji
Format Book Chapter
LanguageEnglish
Published Switzerland Springer International Publishing AG 2016
Springer International Publishing
SeriesCommunications in Computer and Information Science
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Researches on emotion estimation from text mostly use machine learning method. Because machine learning requires a large amount of example corpora, how to acquire high quality training data has been discussed as one of its major problems. The existing language resources include emotion corpora; however, they are not available if the language is different. Constructing bilingual corpus manually is also financially difficult. We propose a method to convert a training data into different language using an existing Japanese-English parallel emotion corpus. With a bilingual dictionary, the translation candidates are extracted against every word of each sentence included in the corpus. Then the extracted translation candidates are narrowed down into a set of words that highly contribute to emotion estimation and we used the set of words as training data. Moreover, when one language’s unannotated linguistic resources can be obtained, the words can be expanded based on the word distributed expression. By using this expressions, we can improve accuracy without decreasing information volume of one sentence. Then, we tried the corpus expansion without translating target linguistic resource. As the result of the evaluation experiment using the machine learning algorithm, we could clear the effectiveness of the emotion corpus which expanded based on the original language’s unannotated sentences and based on similar sentence. Moreover, when large amount of linguistic resources without annotation can be obtained in one language, their words can be expanded based on distributed expressions of the words. By using distributed expressions, we can improve accuracy without decreasing information volume of one sentence. Then, we attempted to expand corpus without translating target linguistic resource. The result of the evaluation experiment using the machine learning algorithm showed the effectiveness of the expanded emotion corpus based on the original language’s unannotated sentences and their similar sentences.
ISBN:3319527576
9783319527574
ISSN:1865-0929
1865-0937
DOI:10.1007/978-3-319-52758-1_15