Maximum Entropy Optimal Density Control of Discrete-Time Linear Systems and Schrödinger Bridges
We consider an entropy-regularized version of optimal density control of deterministic discrete-time linear systems. Entropy regularization, or a maximum entropy (MaxEnt) method for optimal control has attracted much attention especially in reinforcement learning due to its many advantages such as a...
Saved in:
Published in | IEEE transactions on automatic control Vol. 69; no. 3; pp. 1 - 16 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.03.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | We consider an entropy-regularized version of optimal density control of deterministic discrete-time linear systems. Entropy regularization, or a maximum entropy (MaxEnt) method for optimal control has attracted much attention especially in reinforcement learning due to its many advantages such as a natural exploration strategy. Despite the merits, high-entropy control policies induced by the regularization introduce probabilistic uncertainty into systems, which severely limits the applicability of MaxEnt optimal control to safety-critical systems. To remedy this situation, we impose a Gaussian density constraint at a specified time on the MaxEnt optimal control to directly control state uncertainty. Specifically, we derive the explicit form of the MaxEnt optimal density control. In addition, we also consider the case where density constraints are replaced by fixed point constraints. Then, we characterize the associated state process as a pinned process, which is a generalization of the Brownian bridge to linear systems. Finally, we reveal that the MaxEnt optimal density control gives the so-called Schrödinger bridge associated to a discrete-time linear system. |
---|---|
AbstractList | We consider an entropy-regularized version of optimal density control of deterministic discrete-time linear systems. Entropy regularization, or a maximum entropy (MaxEnt) method for optimal control, has attracted much attention especially in reinforcement learning due to its many advantages, such as a natural exploration strategy. Despite the merits, high-entropy control policies induced by the regularization introduce probabilistic uncertainty into systems, which severely limits the applicability of MaxEnt optimal control to safety-critical systems. To remedy this situation, we impose a Gaussian density constraint at a specified time on the MaxEnt optimal control to directly control state uncertainty. Specifically, we derive the explicit form of the MaxEnt optimal density control. In addition, we also consider the case where density constraints are replaced by fixed-point constraints. Then, we characterize the associated state process as a pinned process, which is a generalization of the Brownian bridge to linear systems. Finally, we reveal that the MaxEnt optimal density control gives the so-called Schrödinger bridge associated with a discrete-time linear system. We consider an entropy-regularized version of optimal density control of deterministic discrete-time linear systems. Entropy regularization, or a maximum entropy (MaxEnt) method for optimal control has attracted much attention especially in reinforcement learning due to its many advantages such as a natural exploration strategy. Despite the merits, high-entropy control policies induced by the regularization introduce probabilistic uncertainty into systems, which severely limits the applicability of MaxEnt optimal control to safety-critical systems. To remedy this situation, we impose a Gaussian density constraint at a specified time on the MaxEnt optimal control to directly control state uncertainty. Specifically, we derive the explicit form of the MaxEnt optimal density control. In addition, we also consider the case where density constraints are replaced by fixed point constraints. Then, we characterize the associated state process as a pinned process, which is a generalization of the Brownian bridge to linear systems. Finally, we reveal that the MaxEnt optimal density control gives the so-called Schrödinger bridge associated to a discrete-time linear system. |
Author | Kashima, Kenji Ito, Kaito |
Author_xml | – sequence: 1 givenname: Kaito orcidid: 0000-0003-2913-4953 surname: Ito fullname: Ito, Kaito organization: School of Computing, Tokyo Institute of Technology, Yokohama, Japan – sequence: 2 givenname: Kenji orcidid: 0000-0002-2963-2584 surname: Kashima fullname: Kashima, Kenji organization: Graduate School of Informatics, Kyoto University, Kyoto, Japan |
BookMark | eNpNkD1PwzAQhi1UJNrCzsBgiTnlbCe2M5a2fEhFHVpmkyaX4qr5wE4l8sf4A_wxErUD0-l0z3une0ZkUFYlEnLLYMIYxA-b6WzCgYuJEBAJFl-QIYsiHfCIiwEZAjAdxFzLKzLyft-1MgzZkHy8Jd-2OBZ0UTauqlu6qhtbJAc6x9LbpqWzqh8caJXTufWpwwaDjS2QLm2JiaPr1jdYeJqUGV2nn-73J7PlDh19dDbbob8ml3ly8HhzrmPy_rTYzF6C5er5dTZdBimPeROEkgkZJjlnEjBUsRJMQ8TCSG65kDFmHLIQFSid5TlsQXFIIc3SBLcSUq3FmNyf9tau-jqib8y-OrqyO2l4LHikQs16Ck5U6irvHeamdt27rjUMTO_RdB5N79GcPXaRu1PEIuI_nDOthBJ_LblwUg |
CODEN | IETAA9 |
CitedBy_id | crossref_primary_10_1109_LCSYS_2024_3397228 |
Cites_doi | 10.1109/LCSYS.2018.2826038 10.1002/9781118122631 10.1080/18824889.2022.2095827 10.1007/978-1-4419-9887-3 10.1109/LCSYS.2018.2855185 10.1109/TAC.2015.2440567 10.1007/BF01442404 10.1109/ACC.2016.7526817 10.1109/CDC.2017.8264189 10.1007/s10957-015-0803-z 10.1016/j.automatica.2018.01.029 10.1109/18.532893 10.1146/annurev-control-070220-100858 10.1109/TAC.2018.2791362 10.1137/20M1339982 10.1007/978-1-4612-3462-3_55 10.1002/0471200611 10.1109/TAC.2016.2626796 10.1073/pnas.0710743106 10.1109/TAC.2015.2457784 10.1063/1.528481 10.1109/CDC.2016.7798422 10.3934/dcds.2014.34.1533 10.1007/BF02169515 10.1109/TAC.2015.2457791 10.1109/LCSYS.2020.3047132 10.1007/978-3-319-11970-0_8 10.1016/j.arcontrol.2022.09.003 10.1109/TAC.2021.3060704 10.23919/ACC45564.2020.9147505 10.1561/2200000073 10.23919/ACC45564.2020.9147531 |
ContentType | Journal Article |
Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024 |
Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024 |
DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 7TB 8FD FR3 JQ2 L7M L~C L~D |
DOI | 10.1109/TAC.2023.3305319 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005-present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Xplore Digital Library CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Mechanical & Transportation Engineering Abstracts Technology Research Database Engineering Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Mechanical & Transportation Engineering Abstracts Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Engineering Research Database Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
DatabaseTitleList | Technology Research Database |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library Online url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering |
EISSN | 1558-2523 |
EndPage | 16 |
ExternalDocumentID | 10_1109_TAC_2023_3305319 10218737 |
Genre | orig-research |
GrantInformation_xml | – fundername: JST, ACT-X grantid: JPMJAX2102 – fundername: JSPS KAKENHI grantid: JP21J14577; JP21H04875 |
GroupedDBID | -~X .DC 0R~ 29I 4.4 5GY 6IK 97E AAJGR AASAJ ABQJQ ACGFO ACGFS ACIWK ACNCT AENEX AKJIK ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS F5P HZ~ IFIPE IPLJI JAVBF LAI M43 MS~ O9- OCL P2P RIA RIE RIG RNS TAE TN5 ~02 3EH 5VS AAYOK AAYXX AETIX AI. AIBXA ALLEH CITATION EJD H~9 IAAWW IBMZZ ICLAB IDIHD IFJZH VH1 VJK 7SC 7SP 7TB 8FD FR3 JQ2 L7M L~C L~D |
ID | FETCH-LOGICAL-c292t-461364af2160e4797318051456b2369ed20d4e7078dff0b0720c0cdcaeb60c883 |
IEDL.DBID | RIE |
ISSN | 0018-9286 |
IngestDate | Thu Oct 10 18:19:48 EDT 2024 Fri Aug 23 02:38:11 EDT 2024 Mon Nov 04 12:13:48 EST 2024 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 3 |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c292t-461364af2160e4797318051456b2369ed20d4e7078dff0b0720c0cdcaeb60c883 |
ORCID | 0000-0002-2963-2584 0000-0003-2913-4953 |
PQID | 2932574818 |
PQPubID | 85475 |
PageCount | 16 |
ParticipantIDs | ieee_primary_10218737 crossref_primary_10_1109_TAC_2023_3305319 proquest_journals_2932574818 |
PublicationCentury | 2000 |
PublicationDate | 2024-03-01 |
PublicationDateYYYYMMDD | 2024-03-01 |
PublicationDate_xml | – month: 03 year: 2024 text: 2024-03-01 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | New York |
PublicationPlace_xml | – name: New York |
PublicationTitle | IEEE transactions on automatic control |
PublicationTitleAbbrev | TAC |
PublicationYear | 2024 |
Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
References | ref35 Levine (ref6) 2018 ref12 ref34 ref15 ref37 ref14 ref31 ref30 ref11 ref33 ref32 ref2 ref1 ref17 ref39 ref16 ref38 ref19 ref18 Eysenbach (ref8) 2022 Ben-Israel (ref36) 2003; 15 ref24 ref23 ref26 ref20 ref41 ref22 ref21 Haarnoja (ref5) 2018 ref28 ref27 Schrdinger (ref10) 1932; 2 ref29 Beghi (ref13) 1997; 7 ref9 ref3 Yu (ref25) 2021 Ho (ref7) 2016 ref40 Haarnoja (ref4) 2017 |
References_xml | – volume: 2 start-page: 269 issue: 4 year: 1932 ident: ref10 article-title: Sur la thorie relativiste de llectron et linterprtation de la mcanique quantique publication-title: Annales de linstitut Henri Poincar contributor: fullname: Schrdinger – ident: ref28 doi: 10.1109/LCSYS.2018.2826038 – ident: ref33 doi: 10.1002/9781118122631 – ident: ref3 doi: 10.1080/18824889.2022.2095827 – ident: ref41 doi: 10.1007/978-1-4419-9887-3 – ident: ref27 doi: 10.1109/LCSYS.2018.2855185 – ident: ref32 doi: 10.1109/TAC.2015.2440567 – ident: ref16 doi: 10.1007/BF01442404 – ident: ref29 doi: 10.1109/ACC.2016.7526817 – ident: ref21 doi: 10.1109/CDC.2017.8264189 – ident: ref12 doi: 10.1007/s10957-015-0803-z – volume: 7 start-page: 343 issue: 3 year: 1997 ident: ref13 article-title: Continuous-time GaussMarkov processes with fixed reciprocal dynamics publication-title: J. Math. Syst. Estimation Control contributor: fullname: Beghi – ident: ref20 doi: 10.1016/j.automatica.2018.01.029 – ident: ref22 doi: 10.1109/18.532893 – ident: ref31 doi: 10.1146/annurev-control-070220-100858 – volume-title: Proc. Neural Inf. Process. Syst. year: 2016 ident: ref7 article-title: Generative adversarial imitation learning contributor: fullname: Ho – ident: ref35 doi: 10.1109/TAC.2018.2791362 – ident: ref38 doi: 10.1137/20M1339982 – ident: ref17 doi: 10.1007/978-1-4612-3462-3_55 – ident: ref34 doi: 10.1002/0471200611 – ident: ref18 doi: 10.1109/TAC.2016.2626796 – ident: ref1 doi: 10.1073/pnas.0710743106 – ident: ref9 doi: 10.1109/TAC.2015.2457784 – volume: 15 volume-title: Generalized Inverses: Theory and Applications year: 2003 ident: ref36 contributor: fullname: Ben-Israel – volume-title: Proc. Int. Conf. Learn. Representations year: 2022 ident: ref8 article-title: Maximum entropy RL (provably) solves some robust RL problems contributor: fullname: Eysenbach – ident: ref14 doi: 10.1063/1.528481 – ident: ref19 doi: 10.1109/CDC.2016.7798422 – ident: ref37 doi: 10.3934/dcds.2014.34.1533 – year: 2018 ident: ref6 article-title: Reinforcement learning and control as probabilistic inference: Tutorial and review contributor: fullname: Levine – ident: ref15 doi: 10.1007/BF02169515 – ident: ref40 doi: 10.1109/TAC.2015.2457791 – start-page: 1352 volume-title: Proc. Int. Conf. Mach. Learn. year: 2017 ident: ref4 article-title: Reinforcement learning with deep energy-based policies contributor: fullname: Haarnoja – ident: ref30 doi: 10.1109/LCSYS.2020.3047132 – ident: ref39 doi: 10.1007/978-3-319-11970-0_8 – ident: ref2 doi: 10.1016/j.arcontrol.2022.09.003 – start-page: 1861 volume-title: Proc. Int. Conf. Mach. Learn. year: 2018 ident: ref5 article-title: Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor contributor: fullname: Haarnoja – ident: ref24 doi: 10.1109/TAC.2021.3060704 – ident: ref26 doi: 10.23919/ACC45564.2020.9147505 – ident: ref11 doi: 10.1561/2200000073 – year: 2021 ident: ref25 article-title: Covariance steering for nonlinear control-affine systems contributor: fullname: Yu – ident: ref23 doi: 10.23919/ACC45564.2020.9147531 |
SSID | ssj0016441 |
Score | 2.491092 |
Snippet | We consider an entropy-regularized version of optimal density control of deterministic discrete-time linear systems. Entropy regularization, or a maximum... |
SourceID | proquest crossref ieee |
SourceType | Aggregation Database Publisher |
StartPage | 1 |
SubjectTerms | Bridges Control systems Costs Density Discrete time systems Entropy Gaussian distribution Linear systems Maximum entropy Optimal control Regularization Safety critical Schrödinger bridge stochastic control Stochastic processes Uncertainty |
Title | Maximum Entropy Optimal Density Control of Discrete-Time Linear Systems and Schrödinger Bridges |
URI | https://ieeexplore.ieee.org/document/10218737 https://www.proquest.com/docview/2932574818 |
Volume | 69 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELagEww8iygU5IGFIamTOI49lj6EkFoGQOoWHNsRCLVFkEqUH8Yf4I9xjtOqgJDYMsSW5Xv77r5D6AysOAsyiFR1LLlHo5x5WSKMF0SwIuEm0Np2Iw-G7PKOXo3iUdWsXvbCGGPK4jPj288yl6-namafylp2DDVPomQdrSdCuGatZcrAGnandkGCQ77MSRLRum13fDsm3Ifg3fLcNxtUDlX5pYlL89LfRsPFwVxVyZM_KzJfvf_AbPz3yXfQVuVo4rbjjF20ZiZ7aHMFfnAf3Q_k2-N4NsY9W67-PMfXoD_GsKhrq9qLOe64OnY8zXH3EfQLONie7RnBEMGChOAK7hzLicY36uHl80OXr4T4woFH1NFdv3fbufSqiQueCkVYeBSMO6MyDwNGDE3sWCtuAdJjloURE0aHRFNjAYJ0npOMJCFRRGklTcaI4jw6QLXJdGIOEeZERIbKQOQmBqfHZFSGTEaKM0FjqXQDnS9okD47YI20DEiISIFeqaVXWtGrger2Slf-c7fZQM0F1dJK9F5T8F9ADVFwRI7-WHaMNmB36irJmqhWvMzMCbgWRXZastQXzzvKrw |
link.rule.ids | 315,783,787,799,27936,27937,55086 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3JTsMwEB2xHIADO6KsPnDhkOAkjuMcoYDK0nKgSNyCYzsCobYIUgn4MH6AH2Mcp4hFSNxyiGXLM555Y8-8AdhBL86DHCNVHUvhsajgXp6kxgsiHJEIE2htq5HbHd66YqfX8XVdrF7VwhhjquQz49vP6i1fD9TQXpXt2TbUIomScZhEYC24K9f6fDSwrt0ZXjzDofh8laTpXne_6dtG4T6G71brvnmhqq3KL1tcOZjjOeiMlubySu79YZn76vUHa-O_1z4PszXUJPtONxZgzPQXYeYLAeES3LTl811v2CNHNmH94YVcoAXp4aBDm9devpCmy2Qng4Ic3qGFQYjt2aoRgjEsnhFSE54T2dfkUt0-vr_p6p6QHDj6iGW4Oj7qNlte3XPBU2Ealh5D986ZLMKAU8MS29hKWIr0mOdhxFOjQ6qZsRRBuihoTpOQKqq0kibnVAkRrcBEf9A3q0AETSPDZJAWJkbYY3ImQy4jJXjKYql0A3ZHMsgeHLVGVoUkNM1QXpmVV1bLqwHLdku__Od2swEbI6ll9eF7yhDBoCFiCEXW_hi2DVOtbvs8Oz_pnK3DNM7EXF7ZBkyUj0OziUCjzLcq9foAk-LN-g |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Maximum+Entropy+Optimal+Density+Control+of+Discrete-Time+Linear+Systems+and+Schr%C3%B6dinger+Bridges&rft.jtitle=IEEE+transactions+on+automatic+control&rft.au=Ito%2C+Kaito&rft.au=Kashima%2C+Kenji&rft.date=2024-03-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=0018-9286&rft.eissn=1558-2523&rft.volume=69&rft.issue=3&rft.spage=1536&rft_id=info:doi/10.1109%2FTAC.2023.3305319&rft.externalDBID=NO_FULL_TEXT |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0018-9286&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0018-9286&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0018-9286&client=summon |