Refinement by Filtering Translation Candidates and Similarity Based Approach to Expand Emotion Tagged Corpus
Researches on emotion estimation from text mostly use machine learning method. Because machine learning requires a large amount of example corpora, how to acquire high quality training data has been discussed as one of its major problems. The existing language resources include emotion corpora; howe...
Saved in:
Published in | Knowledge Discovery, Knowledge Engineering and Knowledge Management Vol. 631; pp. 260 - 280 |
---|---|
Main Authors | , , , |
Format | Book Chapter |
Language | English |
Published |
Switzerland
Springer International Publishing AG
2016
Springer International Publishing |
Series | Communications in Computer and Information Science |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Researches on emotion estimation from text mostly use machine learning method. Because machine learning requires a large amount of example corpora, how to acquire high quality training data has been discussed as one of its major problems. The existing language resources include emotion corpora; however, they are not available if the language is different. Constructing bilingual corpus manually is also financially difficult. We propose a method to convert a training data into different language using an existing Japanese-English parallel emotion corpus. With a bilingual dictionary, the translation candidates are extracted against every word of each sentence included in the corpus. Then the extracted translation candidates are narrowed down into a set of words that highly contribute to emotion estimation and we used the set of words as training data. Moreover, when one language’s unannotated linguistic resources can be obtained, the words can be expanded based on the word distributed expression. By using this expressions, we can improve accuracy without decreasing information volume of one sentence. Then, we tried the corpus expansion without translating target linguistic resource. As the result of the evaluation experiment using the machine learning algorithm, we could clear the effectiveness of the emotion corpus which expanded based on the original language’s unannotated sentences and based on similar sentence. Moreover, when large amount of linguistic resources without annotation can be obtained in one language, their words can be expanded based on distributed expressions of the words. By using distributed expressions, we can improve accuracy without decreasing information volume of one sentence. Then, we attempted to expand corpus without translating target linguistic resource. The result of the evaluation experiment using the machine learning algorithm showed the effectiveness of the expanded emotion corpus based on the original language’s unannotated sentences and their similar sentences. |
---|---|
ISBN: | 3319527576 9783319527574 |
ISSN: | 1865-0929 1865-0937 |
DOI: | 10.1007/978-3-319-52758-1_15 |