SAMPLE-AWARE ENTROPY REGULARIZATION TECHNIQUE FOR SAMPLE-EFFICIENT SEARCH
The present invention relates to a sample-aware entropy regularization technique for sample-efficient search. An adaptive DAC method may comprise the steps of: storing, in an experience replay memory, an experience sample generated using an updated policy; sampling a random experience mini-batch fro...
Saved in:
Main Authors | , |
---|---|
Format | Patent |
Language | English French Korean |
Published |
23.06.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The present invention relates to a sample-aware entropy regularization technique for sample-efficient search. An adaptive DAC method may comprise the steps of: storing, in an experience replay memory, an experience sample generated using an updated policy; sampling a random experience mini-batch from the experience replay memory; for the sampled mini-batch, calculating a ratio function; for the sampled batch, updating a value function and a policy parameter by using the calculated ratio function; for the sampled mini-batch, updating a ratio function parameter; and, for the sampled mini-batch, adjusting the proportions of an experience probability distribution and a policy probability distribution in the experience replay memory.
La présente invention concerne une technique de régularisation d'entropie sensible à un échantillon pour une recherche efficace d'échantillon. Un procédé DAC adaptatif peut comprendre les étapes consistant à : stocker, dans une mémoire de relecture d'expérience, une expérience-échantillon générée à l'aide d'une politique mise à jour ; échantillonner un mini-lot d'expérience aléatoire provenant de la mémoire de relecture d'expérience ; pour le mini-lot échantillonné, calculer une fonction de rapport ; pour le lot échantillonné, mettre à jour une fonction de valeur et un paramètre de politique à l'aide de la fonction de rapport calculée ; pour le mini-lot échantillonné, mettre à jour un paramètre de fonction de rapport ; et, pour le mini-lot échantillonné, ajuster les proportions d'une distribution de probabilité d'expérience et d'une distribution de probabilité de politique dans la mémoire de relecture d'expérience.
샘플 효율적인 탐색을 위한 샘플-인지 엔트로피 정규화 기법에 관한 것이다. 적응형 DAC 방법에 있어서, 업데이트된 정책(policy)을 이용하여 생성된 경험 샘플을 경험 리플레이 메모리(experience replay memory)에 저장하는 단계, 경험 리플레이 메모리로부터 경험의 랜덤 미니-배치(mini-batch)를 샘플링(sampling)하는 단계, 샘플링한 미니-배치를 대상으로, 비율 함수를 계산하는 단계, 샘플링한 배치를 대상으로, 계산한 비율 함수를 이용하여 가치 함수(value function) 및 정책의 매개변수를 업데이트하는 단계, 샘플링한 미니-배치를 대상으로, 비율 함수의 매개변수를 업데이트하는 단계, 샘플링한 미니-배치를 대상으로, 경험 리플레이 메모리 내 경험의 확률 분포와 정책의 확률 분포의 비중을 조정하는 단계를 포함할 수 있다. |
---|---|
Bibliography: | Application Number: WO2020KR95152 |