SAMPLE-AWARE ENTROPY REGULARIZATION TECHNIQUE FOR SAMPLE-EFFICIENT SEARCH

The present invention relates to a sample-aware entropy regularization technique for sample-efficient search. An adaptive DAC method may comprise the steps of: storing, in an experience replay memory, an experience sample generated using an updated policy; sampling a random experience mini-batch fro...

Full description

Saved in:

Bibliographic Details
Main Authors	HAN, Seungyul, SUNG, Youngchul
Format	Patent
Language	English French Korean
Published	23.06.2022
Subjects	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING PHYSICS
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The present invention relates to a sample-aware entropy regularization technique for sample-efficient search. An adaptive DAC method may comprise the steps of: storing, in an experience replay memory, an experience sample generated using an updated policy; sampling a random experience mini-batch from the experience replay memory; for the sampled mini-batch, calculating a ratio function; for the sampled batch, updating a value function and a policy parameter by using the calculated ratio function; for the sampled mini-batch, updating a ratio function parameter; and, for the sampled mini-batch, adjusting the proportions of an experience probability distribution and a policy probability distribution in the experience replay memory. La présente invention concerne une technique de régularisation d'entropie sensible à un échantillon pour une recherche efficace d'échantillon. Un procédé DAC adaptatif peut comprendre les étapes consistant à : stocker, dans une mémoire de relecture d'expérience, une expérience-échantillon générée à l'aide d'une politique mise à jour ; échantillonner un mini-lot d'expérience aléatoire provenant de la mémoire de relecture d'expérience ; pour le mini-lot échantillonné, calculer une fonction de rapport ; pour le lot échantillonné, mettre à jour une fonction de valeur et un paramètre de politique à l'aide de la fonction de rapport calculée ; pour le mini-lot échantillonné, mettre à jour un paramètre de fonction de rapport ; et, pour le mini-lot échantillonné, ajuster les proportions d'une distribution de probabilité d'expérience et d'une distribution de probabilité de politique dans la mémoire de relecture d'expérience. 샘플 효율적인 탐색을 위한 샘플-인지 엔트로피 정규화 기법에 관한 것이다. 적응형 DAC 방법에 있어서, 업데이트된 정책(policy)을 이용하여 생성된 경험 샘플을 경험 리플레이 메모리(experience replay memory)에 저장하는 단계, 경험 리플레이 메모리로부터 경험의 랜덤 미니-배치(mini-batch)를 샘플링(sampling)하는 단계, 샘플링한 미니-배치를 대상으로, 비율 함수를 계산하는 단계, 샘플링한 배치를 대상으로, 계산한 비율 함수를 이용하여 가치 함수(value function) 및 정책의 매개변수를 업데이트하는 단계, 샘플링한 미니-배치를 대상으로, 비율 함수의 매개변수를 업데이트하는 단계, 샘플링한 미니-배치를 대상으로, 경험 리플레이 메모리 내 경험의 확률 분포와 정책의 확률 분포의 비중을 조정하는 단계를 포함할 수 있다.
Bibliography:	Application Number: WO2020KR95152