REINFORCEMENT LEARNING METHODS, REINFORCEMENT LEARNING PROGRAMS, AND REINFORCEMENT LEARNING SYSTEMS

To improve the probability of satisfying a constraint condition.SOLUTION: When determining the control input to control target 110, an information processor 100 calculates the risk relevant to the state of the control target 110 at present with respect to the constraint condition for the state of co...

Full description

Saved in:
Bibliographic Details
Main Authors IWANE HIDENAO, OKAWA YOSHIHIRO, SASAKI TOMOTAKE, YANAMI HITOSHI
Format Patent
LanguageEnglish
Japanese
Published 10.09.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:To improve the probability of satisfying a constraint condition.SOLUTION: When determining the control input to control target 110, an information processor 100 calculates the risk relevant to the state of the control target 110 at present with respect to the constraint condition for the state of control target 110 based on a prediction value of the state of control target 110 at a future point of time. The information processor 100 determines the control input to the control target 110 at the present time from the range determined depending on the calculated risk.SELECTED DRAWING: Figure 2 【課題】制約条件を充足する確率の向上を図ること。【解決手段】情報処理装置100は、制御対象110への制御入力を決定するにあたり、将来の時点における制御対象110の状態の予測値に基づいて、制御対象110の状態に関する制約条件に対する、現在の時点における制御対象110の状態についての危険度を算出する。情報処理装置100は、算出した危険度に応じて定まる範囲の中から、現在の時点における制御対象110への制御入力を決定する。【選択図】図2
Bibliography:Application Number: JP20190039032