Maximum Entropy Optimal Density Control of Discrete-Time Linear Systems and Schrödinger Bridges

We consider an entropy-regularized version of optimal density control of deterministic discrete-time linear systems. Entropy regularization, or a maximum entropy (MaxEnt) method for optimal control has attracted much attention especially in reinforcement learning due to its many advantages such as a...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on automatic control Vol. 69; no. 3; pp. 1 - 16
Main Authors Ito, Kaito, Kashima, Kenji
Format Journal Article
LanguageEnglish
Published New York IEEE 01.03.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
Abstract We consider an entropy-regularized version of optimal density control of deterministic discrete-time linear systems. Entropy regularization, or a maximum entropy (MaxEnt) method for optimal control has attracted much attention especially in reinforcement learning due to its many advantages such as a natural exploration strategy. Despite the merits, high-entropy control policies induced by the regularization introduce probabilistic uncertainty into systems, which severely limits the applicability of MaxEnt optimal control to safety-critical systems. To remedy this situation, we impose a Gaussian density constraint at a specified time on the MaxEnt optimal control to directly control state uncertainty. Specifically, we derive the explicit form of the MaxEnt optimal density control. In addition, we also consider the case where density constraints are replaced by fixed point constraints. Then, we characterize the associated state process as a pinned process, which is a generalization of the Brownian bridge to linear systems. Finally, we reveal that the MaxEnt optimal density control gives the so-called Schrödinger bridge associated to a discrete-time linear system.
AbstractList We consider an entropy-regularized version of optimal density control of deterministic discrete-time linear systems. Entropy regularization, or a maximum entropy (MaxEnt) method for optimal control, has attracted much attention especially in reinforcement learning due to its many advantages, such as a natural exploration strategy. Despite the merits, high-entropy control policies induced by the regularization introduce probabilistic uncertainty into systems, which severely limits the applicability of MaxEnt optimal control to safety-critical systems. To remedy this situation, we impose a Gaussian density constraint at a specified time on the MaxEnt optimal control to directly control state uncertainty. Specifically, we derive the explicit form of the MaxEnt optimal density control. In addition, we also consider the case where density constraints are replaced by fixed-point constraints. Then, we characterize the associated state process as a pinned process, which is a generalization of the Brownian bridge to linear systems. Finally, we reveal that the MaxEnt optimal density control gives the so-called Schrödinger bridge associated with a discrete-time linear system.
We consider an entropy-regularized version of optimal density control of deterministic discrete-time linear systems. Entropy regularization, or a maximum entropy (MaxEnt) method for optimal control has attracted much attention especially in reinforcement learning due to its many advantages such as a natural exploration strategy. Despite the merits, high-entropy control policies induced by the regularization introduce probabilistic uncertainty into systems, which severely limits the applicability of MaxEnt optimal control to safety-critical systems. To remedy this situation, we impose a Gaussian density constraint at a specified time on the MaxEnt optimal control to directly control state uncertainty. Specifically, we derive the explicit form of the MaxEnt optimal density control. In addition, we also consider the case where density constraints are replaced by fixed point constraints. Then, we characterize the associated state process as a pinned process, which is a generalization of the Brownian bridge to linear systems. Finally, we reveal that the MaxEnt optimal density control gives the so-called Schrödinger bridge associated to a discrete-time linear system.
Author Kashima, Kenji
Ito, Kaito
Author_xml – sequence: 1
  givenname: Kaito
  orcidid: 0000-0003-2913-4953
  surname: Ito
  fullname: Ito, Kaito
  organization: School of Computing, Tokyo Institute of Technology, Yokohama, Japan
– sequence: 2
  givenname: Kenji
  orcidid: 0000-0002-2963-2584
  surname: Kashima
  fullname: Kashima, Kenji
  organization: Graduate School of Informatics, Kyoto University, Kyoto, Japan
BookMark eNpNkD1PwzAQhi1UJNrCzsBgiTnlbCe2M5a2fEhFHVpmkyaX4qr5wE4l8sf4A_wxErUD0-l0z3une0ZkUFYlEnLLYMIYxA-b6WzCgYuJEBAJFl-QIYsiHfCIiwEZAjAdxFzLKzLyft-1MgzZkHy8Jd-2OBZ0UTauqlu6qhtbJAc6x9LbpqWzqh8caJXTufWpwwaDjS2QLm2JiaPr1jdYeJqUGV2nn-73J7PlDh19dDbbob8ml3ly8HhzrmPy_rTYzF6C5er5dTZdBimPeROEkgkZJjlnEjBUsRJMQ8TCSG65kDFmHLIQFSid5TlsQXFIIc3SBLcSUq3FmNyf9tau-jqib8y-OrqyO2l4LHikQs16Ck5U6irvHeamdt27rjUMTO_RdB5N79GcPXaRu1PEIuI_nDOthBJ_LblwUg
CODEN IETAA9
CitedBy_id crossref_primary_10_1109_LCSYS_2024_3397228
Cites_doi 10.1109/LCSYS.2018.2826038
10.1002/9781118122631
10.1080/18824889.2022.2095827
10.1007/978-1-4419-9887-3
10.1109/LCSYS.2018.2855185
10.1109/TAC.2015.2440567
10.1007/BF01442404
10.1109/ACC.2016.7526817
10.1109/CDC.2017.8264189
10.1007/s10957-015-0803-z
10.1016/j.automatica.2018.01.029
10.1109/18.532893
10.1146/annurev-control-070220-100858
10.1109/TAC.2018.2791362
10.1137/20M1339982
10.1007/978-1-4612-3462-3_55
10.1002/0471200611
10.1109/TAC.2016.2626796
10.1073/pnas.0710743106
10.1109/TAC.2015.2457784
10.1063/1.528481
10.1109/CDC.2016.7798422
10.3934/dcds.2014.34.1533
10.1007/BF02169515
10.1109/TAC.2015.2457791
10.1109/LCSYS.2020.3047132
10.1007/978-3-319-11970-0_8
10.1016/j.arcontrol.2022.09.003
10.1109/TAC.2021.3060704
10.23919/ACC45564.2020.9147505
10.1561/2200000073
10.23919/ACC45564.2020.9147531
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
7TB
8FD
FR3
JQ2
L7M
L~C
L~D
DOI 10.1109/TAC.2023.3305319
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005-present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Xplore Digital Library
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Mechanical & Transportation Engineering Abstracts
Technology Research Database
Engineering Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Mechanical & Transportation Engineering Abstracts
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Engineering Research Database
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList Technology Research Database

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library Online
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1558-2523
EndPage 16
ExternalDocumentID 10_1109_TAC_2023_3305319
10218737
Genre orig-research
GrantInformation_xml – fundername: JST, ACT-X
  grantid: JPMJAX2102
– fundername: JSPS KAKENHI
  grantid: JP21J14577; JP21H04875
GroupedDBID -~X
.DC
0R~
29I
4.4
5GY
6IK
97E
AAJGR
AASAJ
ABQJQ
ACGFO
ACGFS
ACIWK
ACNCT
AENEX
AKJIK
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
F5P
HZ~
IFIPE
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
RIA
RIE
RIG
RNS
TAE
TN5
~02
3EH
5VS
AAYOK
AAYXX
AETIX
AI.
AIBXA
ALLEH
CITATION
EJD
H~9
IAAWW
IBMZZ
ICLAB
IDIHD
IFJZH
VH1
VJK
7SC
7SP
7TB
8FD
FR3
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c292t-461364af2160e4797318051456b2369ed20d4e7078dff0b0720c0cdcaeb60c883
IEDL.DBID RIE
ISSN 0018-9286
IngestDate Thu Oct 10 18:19:48 EDT 2024
Fri Aug 23 02:38:11 EDT 2024
Mon Nov 04 12:13:48 EST 2024
IsPeerReviewed true
IsScholarly true
Issue 3
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c292t-461364af2160e4797318051456b2369ed20d4e7078dff0b0720c0cdcaeb60c883
ORCID 0000-0002-2963-2584
0000-0003-2913-4953
PQID 2932574818
PQPubID 85475
PageCount 16
ParticipantIDs ieee_primary_10218737
crossref_primary_10_1109_TAC_2023_3305319
proquest_journals_2932574818
PublicationCentury 2000
PublicationDate 2024-03-01
PublicationDateYYYYMMDD 2024-03-01
PublicationDate_xml – month: 03
  year: 2024
  text: 2024-03-01
  day: 01
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on automatic control
PublicationTitleAbbrev TAC
PublicationYear 2024
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref35
Levine (ref6) 2018
ref12
ref34
ref15
ref37
ref14
ref31
ref30
ref11
ref33
ref32
ref2
ref1
ref17
ref39
ref16
ref38
ref19
ref18
Eysenbach (ref8) 2022
Ben-Israel (ref36) 2003; 15
ref24
ref23
ref26
ref20
ref41
ref22
ref21
Haarnoja (ref5) 2018
ref28
ref27
Schrdinger (ref10) 1932; 2
ref29
Beghi (ref13) 1997; 7
ref9
ref3
Yu (ref25) 2021
Ho (ref7) 2016
ref40
Haarnoja (ref4) 2017
References_xml – volume: 2
  start-page: 269
  issue: 4
  year: 1932
  ident: ref10
  article-title: Sur la thorie relativiste de llectron et linterprtation de la mcanique quantique
  publication-title: Annales de linstitut Henri Poincar
  contributor:
    fullname: Schrdinger
– ident: ref28
  doi: 10.1109/LCSYS.2018.2826038
– ident: ref33
  doi: 10.1002/9781118122631
– ident: ref3
  doi: 10.1080/18824889.2022.2095827
– ident: ref41
  doi: 10.1007/978-1-4419-9887-3
– ident: ref27
  doi: 10.1109/LCSYS.2018.2855185
– ident: ref32
  doi: 10.1109/TAC.2015.2440567
– ident: ref16
  doi: 10.1007/BF01442404
– ident: ref29
  doi: 10.1109/ACC.2016.7526817
– ident: ref21
  doi: 10.1109/CDC.2017.8264189
– ident: ref12
  doi: 10.1007/s10957-015-0803-z
– volume: 7
  start-page: 343
  issue: 3
  year: 1997
  ident: ref13
  article-title: Continuous-time GaussMarkov processes with fixed reciprocal dynamics
  publication-title: J. Math. Syst. Estimation Control
  contributor:
    fullname: Beghi
– ident: ref20
  doi: 10.1016/j.automatica.2018.01.029
– ident: ref22
  doi: 10.1109/18.532893
– ident: ref31
  doi: 10.1146/annurev-control-070220-100858
– volume-title: Proc. Neural Inf. Process. Syst.
  year: 2016
  ident: ref7
  article-title: Generative adversarial imitation learning
  contributor:
    fullname: Ho
– ident: ref35
  doi: 10.1109/TAC.2018.2791362
– ident: ref38
  doi: 10.1137/20M1339982
– ident: ref17
  doi: 10.1007/978-1-4612-3462-3_55
– ident: ref34
  doi: 10.1002/0471200611
– ident: ref18
  doi: 10.1109/TAC.2016.2626796
– ident: ref1
  doi: 10.1073/pnas.0710743106
– ident: ref9
  doi: 10.1109/TAC.2015.2457784
– volume: 15
  volume-title: Generalized Inverses: Theory and Applications
  year: 2003
  ident: ref36
  contributor:
    fullname: Ben-Israel
– volume-title: Proc. Int. Conf. Learn. Representations
  year: 2022
  ident: ref8
  article-title: Maximum entropy RL (provably) solves some robust RL problems
  contributor:
    fullname: Eysenbach
– ident: ref14
  doi: 10.1063/1.528481
– ident: ref19
  doi: 10.1109/CDC.2016.7798422
– ident: ref37
  doi: 10.3934/dcds.2014.34.1533
– year: 2018
  ident: ref6
  article-title: Reinforcement learning and control as probabilistic inference: Tutorial and review
  contributor:
    fullname: Levine
– ident: ref15
  doi: 10.1007/BF02169515
– ident: ref40
  doi: 10.1109/TAC.2015.2457791
– start-page: 1352
  volume-title: Proc. Int. Conf. Mach. Learn.
  year: 2017
  ident: ref4
  article-title: Reinforcement learning with deep energy-based policies
  contributor:
    fullname: Haarnoja
– ident: ref30
  doi: 10.1109/LCSYS.2020.3047132
– ident: ref39
  doi: 10.1007/978-3-319-11970-0_8
– ident: ref2
  doi: 10.1016/j.arcontrol.2022.09.003
– start-page: 1861
  volume-title: Proc. Int. Conf. Mach. Learn.
  year: 2018
  ident: ref5
  article-title: Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor
  contributor:
    fullname: Haarnoja
– ident: ref24
  doi: 10.1109/TAC.2021.3060704
– ident: ref26
  doi: 10.23919/ACC45564.2020.9147505
– ident: ref11
  doi: 10.1561/2200000073
– year: 2021
  ident: ref25
  article-title: Covariance steering for nonlinear control-affine systems
  contributor:
    fullname: Yu
– ident: ref23
  doi: 10.23919/ACC45564.2020.9147531
SSID ssj0016441
Score 2.491092
Snippet We consider an entropy-regularized version of optimal density control of deterministic discrete-time linear systems. Entropy regularization, or a maximum...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Publisher
StartPage 1
SubjectTerms Bridges
Control systems
Costs
Density
Discrete time systems
Entropy
Gaussian distribution
Linear systems
Maximum entropy
Optimal control
Regularization
Safety critical
Schrödinger bridge
stochastic control
Stochastic processes
Uncertainty
Title Maximum Entropy Optimal Density Control of Discrete-Time Linear Systems and Schrödinger Bridges
URI https://ieeexplore.ieee.org/document/10218737
https://www.proquest.com/docview/2932574818
Volume 69
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELagEww8iygU5IGFIamTOI49lj6EkFoGQOoWHNsRCLVFkEqUH8Yf4I9xjtOqgJDYMsSW5Xv77r5D6AysOAsyiFR1LLlHo5x5WSKMF0SwIuEm0Np2Iw-G7PKOXo3iUdWsXvbCGGPK4jPj288yl6-namafylp2DDVPomQdrSdCuGatZcrAGnandkGCQ77MSRLRum13fDsm3Ifg3fLcNxtUDlX5pYlL89LfRsPFwVxVyZM_KzJfvf_AbPz3yXfQVuVo4rbjjF20ZiZ7aHMFfnAf3Q_k2-N4NsY9W67-PMfXoD_GsKhrq9qLOe64OnY8zXH3EfQLONie7RnBEMGChOAK7hzLicY36uHl80OXr4T4woFH1NFdv3fbufSqiQueCkVYeBSMO6MyDwNGDE3sWCtuAdJjloURE0aHRFNjAYJ0npOMJCFRRGklTcaI4jw6QLXJdGIOEeZERIbKQOQmBqfHZFSGTEaKM0FjqXQDnS9okD47YI20DEiISIFeqaVXWtGrger2Slf-c7fZQM0F1dJK9F5T8F9ADVFwRI7-WHaMNmB36irJmqhWvMzMCbgWRXZastQXzzvKrw
link.rule.ids 315,783,787,799,27936,27937,55086
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3JTsMwEB2xHIADO6KsPnDhkOAkjuMcoYDK0nKgSNyCYzsCobYIUgn4MH6AH2Mcp4hFSNxyiGXLM555Y8-8AdhBL86DHCNVHUvhsajgXp6kxgsiHJEIE2htq5HbHd66YqfX8XVdrF7VwhhjquQz49vP6i1fD9TQXpXt2TbUIomScZhEYC24K9f6fDSwrt0ZXjzDofh8laTpXne_6dtG4T6G71brvnmhqq3KL1tcOZjjOeiMlubySu79YZn76vUHa-O_1z4PszXUJPtONxZgzPQXYeYLAeES3LTl811v2CNHNmH94YVcoAXp4aBDm9devpCmy2Qng4Ic3qGFQYjt2aoRgjEsnhFSE54T2dfkUt0-vr_p6p6QHDj6iGW4Oj7qNlte3XPBU2Ealh5D986ZLMKAU8MS29hKWIr0mOdhxFOjQ6qZsRRBuihoTpOQKqq0kibnVAkRrcBEf9A3q0AETSPDZJAWJkbYY3ImQy4jJXjKYql0A3ZHMsgeHLVGVoUkNM1QXpmVV1bLqwHLdku__Od2swEbI6ll9eF7yhDBoCFiCEXW_hi2DVOtbvs8Oz_pnK3DNM7EXF7ZBkyUj0OziUCjzLcq9foAk-LN-g
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Maximum+Entropy+Optimal+Density+Control+of+Discrete-Time+Linear+Systems+and+Schr%C3%B6dinger+Bridges&rft.jtitle=IEEE+transactions+on+automatic+control&rft.au=Ito%2C+Kaito&rft.au=Kashima%2C+Kenji&rft.date=2024-03-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=0018-9286&rft.eissn=1558-2523&rft.volume=69&rft.issue=3&rft.spage=1536&rft_id=info:doi/10.1109%2FTAC.2023.3305319&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0018-9286&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0018-9286&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0018-9286&client=summon