Stochastic Optimal Control for Multivariable Dynamical Systems Using Expectation Maximization

Trajectory optimization is a fundamental stochastic optimal control (SOC) problem. This article deals with a trajectory optimization approach for dynamical systems subject to measurement noise that can be fitted into linear time-varying stochastic models. Exact/complete solutions to these kind of co...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transaction on neural networks and learning systems Vol. 34; no. 9; pp. 5268 - 5282
Main Authors	Mallick, Prakash, Chen, Zhiyong
Format	Journal Article
Language	English
Published	Piscataway IEEE 01.09.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Covariance matrix Dynamical systems Expectation maximization (EM) Exploitation Iterative methods Markov processes Maximization maximum likelihood Maximum likelihood estimation Multivariable control Noise measurement Optimal control Optimization Parameter estimation Reinforcement learning Stochastic models Stochastic processes stochastic systems Stochasticity Trajectory optimization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Trajectory optimization is a fundamental stochastic optimal control (SOC) problem. This article deals with a trajectory optimization approach for dynamical systems subject to measurement noise that can be fitted into linear time-varying stochastic models. Exact/complete solutions to these kind of control problems have been deemed analytically intractable in literature because they come under the category of partially observable Markov decision processes (MDPs). Therefore, effective solutions with reasonable approximations are widely sought for. We propose a reformulation of stochastic control in a reinforcement learning setting. This type of formulation assimilates the benefits of conventional optimal control procedure, with the advantages of maximum likelihood approaches. Finally, an iterative trajectory optimization paradigm called as SOC-expectation maximization (SOC-EM) is put forth. This trajectory optimization procedure exhibits better performance in terms of reduction in cumulative cost-to-go which is proven both theoretically and empirically. Furthermore, we also provide novel theoretical work which is related to uniqueness of control parameter estimates. Analysis of the control covariance matrix is presented, which handles stochasticity through efficiently balancing exploration and exploitation.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2162-237X 2162-2388 2162-2388
DOI:	10.1109/TNNLS.2022.3190246