METHOD AND APPARATUS FOR ADAPTIVE MULTI-BATCH EXPERIENCE REPLAY FOR CONTINUOUS ACTION CONTROL

The present invention relates to an adaptive multi-batch experience replay (AMBER) technique for continuous action space control which increases sample efficiency. An adaptive multi-batch experience replay (AMBER) method comprises: a step of storing an information tuple of samples generated based on...

Full description

Saved in:

Bibliographic Details
Main Authors	SUNG YOUNGCHUL, HAN SEUNGYUL
Format	Patent
Language	English Korean
Published	30.12.2019
Subjects	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING PHYSICS
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The present invention relates to an adaptive multi-batch experience replay (AMBER) technique for continuous action space control which increases sample efficiency. An adaptive multi-batch experience replay (AMBER) method comprises: a step of storing an information tuple of samples generated based on an updated policy in a replay memory in multiple batches; a step of adjusting the size of a random mini-batch to reduce average importance sampling significance; a step of calculating average importance sampling significance of each sample batch in the replay memory; a step of dropping a batch having the calculated average importance sampling significance larger than a previously designated batch drop coefficient for the replay memory; and a step of performing random mini-batch sampling based on a batch excluded from the drop to update a parameter for the replay memory. 연속 행동 공간 제어를 위한 적응형 다중-배치 경험 리플레이 기법에 관한 것이다. 적응형 다중 배치 경험 리플레이(AMBER) 방법에 있어서, 업데이트된 정책에 기초하여 생성된 샘플의 정보 튜플(tuple)을 다중 배치로 리플레이 메모리(replay memory)에 저장하는 단계, 랜덤 미니 배치(mini-batch)의 크기를 조정하여 평균 중요도 샘플링(importance sampling) 비중을 감소시키는 단계, 상기 리플레이 메모리(replay memory) 내 각 샘플 배치의 평균 중요도 샘플링 비중을 계산하는 단계, 상기 리플레이 메모리를 대상으로, 계산된 상기 평균 중요도 샘플링 비중이 미리 지정된 배치 드롭 계수 보다 큰 배치를 드롭시키는 단계, 및 상기 리플레이 메모리를 대상으로, 드롭에서 제외된 배치에 기초하여 랜덤 미니 배치 샘플링을 수행하여 매개 변수를 업데이트하는 단계를 포함할 수 있다.
Bibliography:	Application Number: KR20180102008