REINFORCEMENT LEARNING METHODS, REINFORCEMENT LEARNING PROGRAMS, AND REINFORCEMENT LEARNING SYSTEMS
To improve the probability of satisfying a constraint condition.SOLUTION: When determining the control input to control target 110, an information processor 100 calculates the risk relevant to the state of the control target 110 at present with respect to the constraint condition for the state of co...
Saved in:
Main Authors | , , , |
---|---|
Format | Patent |
Language | English Japanese |
Published |
10.09.2020
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | To improve the probability of satisfying a constraint condition.SOLUTION: When determining the control input to control target 110, an information processor 100 calculates the risk relevant to the state of the control target 110 at present with respect to the constraint condition for the state of control target 110 based on a prediction value of the state of control target 110 at a future point of time. The information processor 100 determines the control input to the control target 110 at the present time from the range determined depending on the calculated risk.SELECTED DRAWING: Figure 2
【課題】制約条件を充足する確率の向上を図ること。【解決手段】情報処理装置100は、制御対象110への制御入力を決定するにあたり、将来の時点における制御対象110の状態の予測値に基づいて、制御対象110の状態に関する制約条件に対する、現在の時点における制御対象110の状態についての危険度を算出する。情報処理装置100は、算出した危険度に応じて定まる範囲の中から、現在の時点における制御対象110への制御入力を決定する。【選択図】図2 |
---|---|
Bibliography: | Application Number: JP20190039032 |