On Minimizing Total Discounted Cost in MDPs Subject to Reachability Constraints
In this article, we study the synthesisof a policy in a Markov decision process (MDP) following which an agent reaches a target state in the MDP while minimizing its total discounted cost. The problem combines a reachability criterion with a discounted cost criterion and naturally expresses the comp...
Saved in:
Published in | IEEE transactions on automatic control Vol. 69; no. 9; pp. 6466 - 6473 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.09.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
ISSN | 0018-9286 1558-2523 |
DOI | 10.1109/TAC.2024.3384834 |
Cover
Loading…
Abstract | In this article, we study the synthesisof a policy in a Markov decision process (MDP) following which an agent reaches a target state in the MDP while minimizing its total discounted cost. The problem combines a reachability criterion with a discounted cost criterion and naturally expresses the completion of a task with probabilistic guarantees and optimal transient performance. We first establish that an optimal policy for the considered formulation may not exist but that there always exists a near-optimal stationary policy. We additionally provide a necessary and sufficient condition for the existence of an optimal policy. We then restrict our attention to stationary deterministic policies and show that the decision problem associated with the synthesis of an optimal stationary deterministic policy is NP-complete. Finally, we provide an exact algorithm based on mixed-integer linear programming and propose an efficient approximation algorithm based on linear programming for the synthesis of an optimal stationary deterministic policy. |
---|---|
AbstractList | In this article, we study the synthesisof a policy in a Markov decision process (MDP) following which an agent reaches a target state in the MDP while minimizing its total discounted cost. The problem combines a reachability criterion with a discounted cost criterion and naturally expresses the completion of a task with probabilistic guarantees and optimal transient performance. We first establish that an optimal policy for the considered formulation may not exist but that there always exists a near-optimal stationary policy. We additionally provide a necessary and sufficient condition for the existence of an optimal policy. We then restrict our attention to stationary deterministic policies and show that the decision problem associated with the synthesis of an optimal stationary deterministic policy is NP-complete. Finally, we provide an exact algorithm based on mixed-integer linear programming and propose an efficient approximation algorithm based on linear programming for the synthesis of an optimal stationary deterministic policy. |
Author | Savas, Yagiz Topcu, Ufuk Hibbard, Michael Verginis, Christos K. |
Author_xml | – sequence: 1 givenname: Yagiz orcidid: 0000-0003-2976-0786 surname: Savas fullname: Savas, Yagiz email: yagiz.savas@utexas.edu organization: University of Texas at Austin, Austin, TX, USA – sequence: 2 givenname: Christos K. orcidid: 0000-0002-4289-2866 surname: Verginis fullname: Verginis, Christos K. email: christos.verginis@austin.utexas.edu organization: University of Texas at Austin, Austin, TX, USA – sequence: 3 givenname: Michael orcidid: 0000-0002-4697-4551 surname: Hibbard fullname: Hibbard, Michael email: mhibbard@utexas.edu organization: University of Texas at Austin, Austin, TX, USA – sequence: 4 givenname: Ufuk orcidid: 0000-0003-0819-9985 surname: Topcu fullname: Topcu, Ufuk email: utopcu@utexas.edu organization: University of Texas at Austin, Austin, TX, USA |
BookMark | eNpNkE1PAjEQhhuDiYDePXho4nlx-rXbHgn4lWgwiuemW7paAl3cdg_46y2Bg6fJJM_7zuQZoUFog0PomsCEEFB3y-lsQoHyCWOSS8bP0JAIIQsqKBugIQCRhaKyvECjGNd5LTknQ7RYBPzqg9_6Xx--8LJNZoPnPtq2D8mt8KyNCfvMzN8i_ujrtbMJpxa_O2O_Te03Pu0zFGLqjA8pXqLzxmyiuzrNMfp8uF_OnoqXxePzbPpSWMpFKgQYagiIuuHAQdhVTY0ksiTCSUmrSjlKKmAmM0qyVWlV3QCzQPPXVcmAjdHtsXfXtT-9i0mv274L-aRmoCpFKWUkU3CkbNfG2LlG7zq_Nd1eE9AHbTpr0wdt-qQtR26OEe-c-4dzBYQL9gdSK2hh |
CODEN | IETAA9 |
Cites_doi | 10.1007/1-4020-8066-2_23 10.1137/1023004 10.1109/CDC40024.2019.9029287 10.1145/230514.571645 10.1007/978-3-540-71209-1_6 10.1007/3-540-48320-9_7 10.1287/moor.16.3.580 10.1287/moor.20.2.302 10.1145/3232848 10.1109/TAC.2004.826725 10.1007/BF01386390 10.1145/1390156.1390162 10.1007/11672142_26 10.23919/ACC50511.2021.9482749 10.1145/3424305 10.1109/TCST.2010.2103379 10.1109/TAC.2014.2298143 10.1007/978-3-030-45190-5_19 10.1002/nav.21743 10.1609/aaai.v26i1.8367 10.1287/moor.25.1.130.15210 10.1109/9.751365 |
ContentType | Journal Article |
Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024 |
Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024 |
DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 7TB 8FD FR3 JQ2 L7M L~C L~D |
DOI | 10.1109/TAC.2024.3384834 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Mechanical & Transportation Engineering Abstracts Technology Research Database Engineering Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Mechanical & Transportation Engineering Abstracts Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Engineering Research Database Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
DatabaseTitleList | Technology Research Database |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering |
EISSN | 1558-2523 |
EndPage | 6473 |
ExternalDocumentID | 10_1109_TAC_2024_3384834 10490145 |
Genre | orig-research |
GrantInformation_xml | – fundername: ARL grantid: W911NF-17-2-0181 – fundername: DARPA grantid: D19AP00004 – fundername: AFRL grantid: FA9550-19-1-0169 |
GroupedDBID | -~X .DC 0R~ 29I 3EH 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK ACNCT AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD F5P HZ~ H~9 IAAWW IBMZZ ICLAB IDIHD IFIPE IFJZH IPLJI JAVBF LAI M43 MS~ O9- OCL P2P RIA RIE RNS TAE TN5 VH1 VJK ~02 AAYOK AAYXX CITATION RIG 7SC 7SP 7TB 8FD FR3 JQ2 L7M L~C L~D |
ID | FETCH-LOGICAL-c245t-50a2a105bf40405cdb2a818615e882779e21703aa10983d6c9bf03c0264476303 |
IEDL.DBID | RIE |
ISSN | 0018-9286 |
IngestDate | Mon Jun 30 10:16:08 EDT 2025 Tue Jul 01 03:36:49 EDT 2025 Wed Aug 27 02:03:43 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 9 |
Language | English |
License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c245t-50a2a105bf40405cdb2a818615e882779e21703aa10983d6c9bf03c0264476303 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ORCID | 0000-0002-4697-4551 0000-0003-2976-0786 0000-0003-0819-9985 0000-0002-4289-2866 |
PQID | 3097922231 |
PQPubID | 85475 |
PageCount | 8 |
ParticipantIDs | crossref_primary_10_1109_TAC_2024_3384834 proquest_journals_3097922231 ieee_primary_10490145 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2024-09-01 |
PublicationDateYYYYMMDD | 2024-09-01 |
PublicationDate_xml | – month: 09 year: 2024 text: 2024-09-01 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | New York |
PublicationPlace_xml | – name: New York |
PublicationTitle | IEEE transactions on automatic control |
PublicationTitleAbbrev | TAC |
PublicationYear | 2024 |
Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
References | ref13 ref12 ref15 ref14 Yang (ref20) 2019 ref31 ref30 ref10 ref2 ref17 ref18 Bertsekas (ref16) 1996 Altman (ref6) 1999 Paruchuri (ref27) 2004 ref23 ref26 Baier (ref25) 2008 ref22 (ref24) 2021 ref21 Kiennert (ref7) 2018; 51 ref28 ref29 ref8 ref9 Gbor (ref19) 1998 ref4 ref3 ref5 Dolgov (ref11) 2005 Puterman (ref1) 2014 |
References_xml | – ident: ref4 doi: 10.1007/1-4020-8066-2_23 – volume-title: Principles of Model Checking year: 2008 ident: ref25 – volume-title: Constrained Markov Decision Processes year: 1999 ident: ref6 – ident: ref26 doi: 10.1137/1023004 – ident: ref30 doi: 10.1109/CDC40024.2019.9029287 – volume-title: Neuro-Dynamic Programming year: 1996 ident: ref16 – year: 2021 ident: ref24 article-title: Gurobi optimizer reference manual – ident: ref28 doi: 10.1145/230514.571645 – start-page: 1326 volume-title: Proc. Int. Joint Conf. Artif. Intell. year: 2005 ident: ref11 article-title: Stationary deterministic policies for constrained MDPs with multiple rewards, costs, and discount factors – start-page: 596 volume-title: Proc. Int. Joint Conf. Auton. Agents Multiagent Syst. year: 2004 ident: ref27 article-title: Towards a formalization of teamwork with resource constraints – ident: ref9 doi: 10.1007/978-3-540-71209-1_6 – ident: ref2 doi: 10.1007/3-540-48320-9_7 – start-page: 197 volume-title: Proc. Int. Conf. Mach. Learn. year: 1998 ident: ref19 article-title: Multi-criteria reinforcement learning – ident: ref8 doi: 10.1287/moor.16.3.580 – ident: ref22 doi: 10.1287/moor.20.2.302 – volume: 51 start-page: 1 issue: 5 year: 2018 ident: ref7 article-title: A survey on game-theoretic approaches for intrusion detection and response optimization publication-title: ACM Comput. Surv. doi: 10.1145/3232848 – ident: ref14 doi: 10.1109/TAC.2004.826725 – ident: ref29 doi: 10.1007/BF01386390 – ident: ref21 doi: 10.1145/1390156.1390162 – ident: ref10 doi: 10.1007/11672142_26 – volume-title: Markov Decision Processes: Discrete Stochastic Dynamic Programming year: 2014 ident: ref1 – ident: ref23 doi: 10.23919/ACC50511.2021.9482749 – ident: ref31 doi: 10.1145/3424305 – ident: ref5 doi: 10.1109/TCST.2010.2103379 – ident: ref3 doi: 10.1109/TAC.2014.2298143 – ident: ref18 doi: 10.1007/978-3-030-45190-5_19 – start-page: 14636 volume-title: Proc. Adv. Neural Inf. Process. Syst. year: 2019 ident: ref20 article-title: A generalized algorithm for multi-objective reinforcement learning and policy adaptation – ident: ref15 doi: 10.1002/nav.21743 – ident: ref17 doi: 10.1609/aaai.v26i1.8367 – ident: ref13 doi: 10.1287/moor.25.1.130.15210 – ident: ref12 doi: 10.1109/9.751365 |
SSID | ssj0016441 |
Score | 2.4567184 |
Snippet | In this article, we study the synthesisof a policy in a Markov decision process (MDP) following which an agent reaches a target state in the MDP while... |
SourceID | proquest crossref ieee |
SourceType | Aggregation Database Index Database Publisher |
StartPage | 6466 |
SubjectTerms | Algorithms Approximation algorithms Costs Criteria Discounting Integer programming Linear programming Markov decision processes Markov decision processes (MDPs) Markov processes Mixed integer optimization Planning Probabilistic logic reachability Reagents Synthesis Task analysis Trajectory Transient performance |
Title | On Minimizing Total Discounted Cost in MDPs Subject to Reachability Constraints |
URI | https://ieeexplore.ieee.org/document/10490145 https://www.proquest.com/docview/3097922231 |
Volume | 69 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFA9uJz34OXE6JQcvHlqzJv06jk0ZwjaRDXYraZNCEVux3cH99b6XdmMqgrcW0hLy8l5-v7wvQm4FV8LlyrfShDMLAbsV-5JbInZ0KlU_ZRoThSdTb7wQT0t32SSrm1wYrbUJPtM2PhpfviqSFV6VgYYL9Pq5LdIC5lYna21dBniw12YXNNgJtj5JFt7PB0Nggo6wgY_h5dm3M8g0Vfllic3x8nhEppuJ1VElr_aqiu1k_aNm479nfkwOG6BJB_XOOCF7Oj8lBzvlB8_IbJbTSZZnb9ka3um8ACBOR1lp-kdoRYdFWdEMxoyeSwoWBq9saFXQF4zArOt7f1Ls-Gn6TFRlhyweH-bDsdU0WLASR7iV5TLpSABYcSpAl91ExY7ECnd9VwPw9v1QA2FhXMKYMODKS8I4ZTxhCKLALjF-Ttp5kesLQj0PuJvnSiBIWnBPBY4KtUwD1Y-xt0fSJXebJY_e6zoakeEfLIxAPBGKJ2rE0yUdXMGdcfXidUlvI6So0bQy4iz0QwQ5_cs_Prsi-_j3OjCsR9rVx0pfA5Ko4huzg74A-lHCcw |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8QwEB58HNSDb3F95uDFQ9dsk7abo6zKqvsQWcFbaZsUitiK7R701zuTdsUHgrcWpjRkMjPfZF4AJ1Jo6QkdOGkiuEOA3YmDSDgydk0a6U7KDRUKD0d-_0HePHqPTbG6rYUxxtjkM9OmRxvL10UypasylHBJUT9vHhbR8EtVl2t9Bg3ItNeKF2XY7X5GJbk6m5z30Bd0ZRs9Mro--2aF7FiVX7rYGpirNRjNllbnlTy1p1XcTt5_dG3899rXYbWBmuy8PhsbMGfyTVj50oBwC8bjnA2zPHvO3vGdTQqE4uwiK-0ECaNZrygrliHNxV3JUMfQpQ2rCnZPOZh1h-83RjM_7aSJqtyGh6vLSa_vNCMWnMSVXuV4PHIjhFhxKlGavUTHbkQ97jqeQegdBMqgy8JFhDSqK7SfqDjlIuEEo1AzcbEDC3mRm11gvo_em-9F6CIZKXzddbUyUdrVnZimeyQtOJ1tefhSd9IIrQfCVYjsCYk9YcOeFmzTDn6hqzevBQczJoWNrJWh4CpQBHM6e398dgxL_clwEA6uR7f7sEx_qtPEDmChep2aQ8QVVXxkT9MHYUjFww |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=On+Minimizing+Total+Discounted+Cost+in+MDPs+Subject+to+Reachability+Constraints&rft.jtitle=IEEE+transactions+on+automatic+control&rft.au=Savas%2C+Yagiz&rft.au=Verginis%2C+Christos+K.&rft.au=Hibbard%2C+Michael&rft.au=Topcu%2C+Ufuk&rft.date=2024-09-01&rft.pub=IEEE&rft.issn=0018-9286&rft.volume=69&rft.issue=9&rft.spage=6466&rft.epage=6473&rft_id=info:doi/10.1109%2FTAC.2024.3384834&rft.externalDocID=10490145 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0018-9286&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0018-9286&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0018-9286&client=summon |