Sentence embedding method and apparatus using subword embedding and skip-thought model

Provided is a method of using a weight of an embedding value of a syntagma constituent word to determine a sentence embedding value by introducing syntagma-based position encoding into a sentence embedding method using subword embedding. In the present invention, in order to integrate skip-thought s...

Full description

Saved in:
Bibliographic Details
Main Authors KANG BYUNG OK, SONG HWA JEON, CHUNG EUI SOK, KIM HYUN WOO, JUNG HO YOUNG, PARK JEON GUE, LEE YUN KEUN, OH YOO RHEE
Format Patent
LanguageEnglish
Korean
Published 12.06.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Provided is a method of using a weight of an embedding value of a syntagma constituent word to determine a sentence embedding value by introducing syntagma-based position encoding into a sentence embedding method using subword embedding. In the present invention, in order to integrate skip-thought sentence embedding learning methodology with a subword embedding technique, a skip-thought sentence embedding learning method based on subword embedding and methodology for simultaneously performing subword embedding learning and skip-thought sentence embedding learning, that is, multitask learning methodology are provided as methodology for applying sentence context information to the subword embedding when the subword embedding is trained. Accordingly, a sentence embedding approach in a bag-of-words form is applied to agglutinative languages such as Korean language. In addition, according to the present invention, the skip-thought sentence embedding learning methodology is integrated with the subword embedding technique such that the sentence context information is used when the subword embedding is trained. A model proposed by the present invention minimizes additional learning parameters taking into consideration the sentence embedding such that most learning results are accumulated in a subword embedding parameter. A method for sentence embedding based on subword embedding and skip-thoughts includes: separating words to separate tokens for an input sentence; extracting subwords from the words determined in the separating of the words; deriving subword embedding vector values by embedding the extracted subwords when the extracting of the subwords is finished; determining position encoding values by performing syntagma-based position encoding by using fixed weight values according to word positions in the sentence in order to perform sentence embedding calculation after the deriving of the subword embedding vector values; and performing the sentence embedding calculation from the subword embedding vector values and the position encoding values. 서브워드 임베딩을 이용한 문장 임베딩 방법에, 어절 기반 포지션 인코딩을 도입하여 어절 구성 단어의 임베딩값 가중치를 문장 임베딩값 결정에 활용하는 방법이 제공된다. 본 발명에서는 스킵서트 문장 임베딩 학습 방법론을 서브워드 임베딩 기술과 통합하기 위하여 서브워드 임베딩을 학습할 때 문장 문맥 정보를 어떻게 서브워드 임베딩에 반영할지의 방법론으로서 서브워드 임베딩 기반 스킵서트 문장 임베딩 학습 방법, 그리고 서브워드 임베딩 학습 및 스킵서트 문장 임베딩 학습의 동시 학습 즉 멀티태스크 러닝 방법론이 제공된다. 이는 한국어와 같은 교착어에 백-오브-워드 방식의 문장 임베딩 접근법을 적용할 수 있게 한다. 또한, 본 발명에 따르면, 스킵서트 문장 임베딩 학습 방법론을 서브워드 임베딩 기술과 통합하여 서브워드 임베딩을 학습할 때 문장 문맥 정보를 이용할 수 있게 한다. 본 발명이 제시하는 모델은 문장 임베딩을 고려한 추가적인 학습 파라미터를 최소화하여, 대부분의 학습 결과가 서브워드 임베딩 파라미터에 누적되게 한다.
Bibliography:Application Number: KR20180154641