SAMPLE-AWARE ENTROPY REGULARIZATION TECHNIQUE FOR SAMPLE-EFFICIENT SEARCH

The present invention relates to a sample-aware entropy regularization technique for sample-efficient search. An adaptive DAC method may comprise the steps of: storing, in an experience replay memory, an experience sample generated using an updated policy; sampling a random experience mini-batch fro...

Full description

Saved in:
Bibliographic Details
Main Authors HAN, Seungyul, SUNG, Youngchul
Format Patent
LanguageEnglish
French
Korean
Published 23.06.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The present invention relates to a sample-aware entropy regularization technique for sample-efficient search. An adaptive DAC method may comprise the steps of: storing, in an experience replay memory, an experience sample generated using an updated policy; sampling a random experience mini-batch from the experience replay memory; for the sampled mini-batch, calculating a ratio function; for the sampled batch, updating a value function and a policy parameter by using the calculated ratio function; for the sampled mini-batch, updating a ratio function parameter; and, for the sampled mini-batch, adjusting the proportions of an experience probability distribution and a policy probability distribution in the experience replay memory. La présente invention concerne une technique de régularisation d'entropie sensible à un échantillon pour une recherche efficace d'échantillon. Un procédé DAC adaptatif peut comprendre les étapes consistant à : stocker, dans une mémoire de relecture d'expérience, une expérience-échantillon générée à l'aide d'une politique mise à jour ; échantillonner un mini-lot d'expérience aléatoire provenant de la mémoire de relecture d'expérience ; pour le mini-lot échantillonné, calculer une fonction de rapport ; pour le lot échantillonné, mettre à jour une fonction de valeur et un paramètre de politique à l'aide de la fonction de rapport calculée ; pour le mini-lot échantillonné, mettre à jour un paramètre de fonction de rapport ; et, pour le mini-lot échantillonné, ajuster les proportions d'une distribution de probabilité d'expérience et d'une distribution de probabilité de politique dans la mémoire de relecture d'expérience. 샘플 효율적인 탐색을 위한 샘플-인지 엔트로피 정규화 기법에 관한 것이다. 적응형 DAC 방법에 있어서, 업데이트된 정책(policy)을 이용하여 생성된 경험 샘플을 경험 리플레이 메모리(experience replay memory)에 저장하는 단계, 경험 리플레이 메모리로부터 경험의 랜덤 미니-배치(mini-batch)를 샘플링(sampling)하는 단계, 샘플링한 미니-배치를 대상으로, 비율 함수를 계산하는 단계, 샘플링한 배치를 대상으로, 계산한 비율 함수를 이용하여 가치 함수(value function) 및 정책의 매개변수를 업데이트하는 단계, 샘플링한 미니-배치를 대상으로, 비율 함수의 매개변수를 업데이트하는 단계, 샘플링한 미니-배치를 대상으로, 경험 리플레이 메모리 내 경험의 확률 분포와 정책의 확률 분포의 비중을 조정하는 단계를 포함할 수 있다.
Bibliography:Application Number: WO2020KR95152