Caption Generation System through Animal Context-Awareness

The present invention relates to a system and a method for generating a caption through animal context awareness. The system includes: a feature extraction module extracting an image feature vector with respect to optical flow and RGB information of image information, extracting a sound feature vect...

Full description

Saved in:

Bibliographic Details
Main Authors	CHOI YOONA, CHAE HEE CHAN, HONG MINKI, JONGUK LEE, PARK DAI HEE, YONGWHA CHUNG
Format	Patent
Language	English Korean
Published	11.03.2022
Subjects	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING HANDLING RECORD CARRIERS PHYSICS PRESENTATION OF DATA RECOGNITION OF DATA RECORD CARRIERS
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The present invention relates to a system and a method for generating a caption through animal context awareness. The system includes: a feature extraction module extracting an image feature vector with respect to optical flow and RGB information of image information, extracting a sound feature vector with respect to sound information, and extracting a sound classification feature vector from the sound feature vector; an encoding module detecting local and global features of an object with respect to the image and sound feature vectors through an attention technique-applied hierarchical LSTM encoder and extracting global and local context vectors; and a decoding module obtaining each of captioning results including and excluding the sound classification feature vector using the global and local context vectors extracted from the encoding module and the sound classification feature vector extracted from the feature extraction module and then generating a final caption result by combining the two captioning results. 동물의 상황인지를 통한 캡션 생성 시스템 및 방법이 제시된다. 본 발명에서 제안하는 동물의 상황인지를 통한 캡션 생성 시스템은 영상 정보의 옵티컬-플로우(optical-flow)와 RGB정보에 대한 영상 특징 벡터를 추출하고, 소리 정보에 대한 소리 특징 벡터를 추출하며, 소리 특징 벡터로부터 소리 분류 특징 벡터를 추출하는 특징 추출 모듈, 어텐션 기법이 적용된 계층적 LSTM 인코더를 통해 영상 특징 벡터 및 소리 특징 벡터에 대한 객체의 지역적 특징 및 전역적 특징을 탐지하고 전역적 문맥(global context) 및 지역적 문맥(local context) 벡터를 추출하는 인코딩 모듈 및 인코딩 모듈에서 추출된 전역적 문맥 및 지역적 문맥 벡터와 특징 추출 모듈에서 추출된 소리 분류 특징 벡터를 이용하여 소리 분류 특징 벡터를 포함한 캡셔닝 결과와 소리 분류 특징 벡터를 배제한 캡셔닝 결과를 각각 구한 후 두 캡셔닝 결과를 결합하여 최종 캡션 결과를 생성하는 디코딩 모듈을 포함한다.
Bibliography:	Application Number: KR20200112132