보상 예측 모델을 사용하여 로봇 제어를 위한 오프라인 학습

보상 예측 모델을 사용하는 오프라인 학습을 위해, 컴퓨터 저장 매체에 인코딩된 컴퓨터 프로그램 포함하는, 방법, 시스템 및 장치가 개시된다. 방법 중 하나는 로봇 경험 데이터를 획득하고; 로봇 경험 데이터의 제1 서브세트에 대해, 입력 관측치를 포함하는 보상 입력을 수신하고 그리고 입력 관측치에 할당되어야 하는 특정 태스크에 대한 태스크 특정 보상의 예측인 보상 예측을 출력으로서 생성하는 보상 예측 모델을 훈련시키고; 훈련된 보상 예측 모델을 사용하여 로봇 경험 데이터에서 경험치들을 프로세싱하여 프로세싱된 경험치들 각각에 대한 각각...

Full description

Saved in:

Bibliographic Details
Main Authors	ZOLNA KONRAD, REED SCOTT ELLISON
Format	Patent
Language	Korean
Published	28.02.2023
Subjects	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING PHYSICS
Online Access	Get full text

Cover

Loading…

More Information
Summary:	보상 예측 모델을 사용하는 오프라인 학습을 위해, 컴퓨터 저장 매체에 인코딩된 컴퓨터 프로그램 포함하는, 방법, 시스템 및 장치가 개시된다. 방법 중 하나는 로봇 경험 데이터를 획득하고; 로봇 경험 데이터의 제1 서브세트에 대해, 입력 관측치를 포함하는 보상 입력을 수신하고 그리고 입력 관측치에 할당되어야 하는 특정 태스크에 대한 태스크 특정 보상의 예측인 보상 예측을 출력으로서 생성하는 보상 예측 모델을 훈련시키고; 훈련된 보상 예측 모델을 사용하여 로봇 경험 데이터에서 경험치들을 프로세싱하여 프로세싱된 경험치들 각각에 대한 각각의 보상 예측을 생성하고; 그리고 (i) 프로세싱된 경험치들 및 (ii) 프로세싱된 경험치들에 대한 각각의 보상 예측에 대해 정책 신경망을 훈련시키는 단계를 포함한다. Methods, systems, and apparatus, including computer programs encoded on computer storage media, for off-line learning using a reward prediction model. One of the methods includes obtaining robot experience data; training, on a first subset of the robot experience data, a reward prediction model that receives a reward input comprising an input observation and generates as output a reward prediction that is a prediction Neural Network of a task-specific reward for the particular task that should be assigned to the input observation; processing experiences in the robot experience data using the trained reward prediction model to generate a respective reward prediction for each of the processed experiences; and training a policy neural network on (i) the processed experiences and (ii) the respective reward predictions for the processed experiences.
Bibliography:	Application Number: KR20237002829