복합 명사구 합성 방법을 적용한 효과적인 도서 본문 주제어 추출

Most of online bookstores are providing a user with the bibliographic book information rather than the concrete information such as thematic words and atmosphere. Especially, thematic words help a user to understand books and cast a wide net. In this paper, we propose an efficient extraction method...

Full description

Saved in:
Bibliographic Details
Published in한국컴퓨터정보학회논문지 Vol. 22; no. 3; pp. 107 - 113
Main Authors 안희정(Hee-Jeong Ahn), 김기원(Kee-Won Kim), 김승훈(Seung-Hoon Kim)
Format Journal Article
LanguageKorean
Published 한국컴퓨터정보학회 01.03.2017
Subjects
Online AccessGet full text
ISSN1598-849X
2383-9945
DOI10.9708/jksci.2017.22.03.107

Cover

More Information
Summary:Most of online bookstores are providing a user with the bibliographic book information rather than the concrete information such as thematic words and atmosphere. Especially, thematic words help a user to understand books and cast a wide net. In this paper, we propose an efficient extraction method of thematic words from book text by applying the compound noun and noun phrase synthetic method. The compound nouns represent the characteristics of a book in more detail than single nouns. The proposed method extracts the thematic word from book text by recognizing two types of noun phrases, such as a single noun and a compound noun combined with single nouns. The recognized single nouns, compound nouns, and noun phrases are calculated through TF-IDF weights and extracted as main words. In addition, this paper suggests a method to calculate the frequency of subject, object, and other roles separately, not just the sum of the frequencies of all nouns in the TF-IDF calculation method. Experiments is carried out in the field of economic management, and thematic word extraction verification is conducted through survey and book search. Thus, 9 out of the 10 experimental results used in this study indicate that the thematic word extracted by the proposed method is more effective in understanding the content. Also, it is confirmed that the thematic word extracted by the proposed method has a better book search result. KCI Citation Count: 0
Bibliography:G704-001619.2017.22.3.003
ISSN:1598-849X
2383-9945
DOI:10.9708/jksci.2017.22.03.107