On Minimizing Total Discounted Cost in MDPs Subject to Reachability Constraints

In this article, we study the synthesisof a policy in a Markov decision process (MDP) following which an agent reaches a target state in the MDP while minimizing its total discounted cost. The problem combines a reachability criterion with a discounted cost criterion and naturally expresses the comp...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on automatic control Vol. 69; no. 9; pp. 6466 - 6473
Main Authors	Savas, Yagiz, Verginis, Christos K., Hibbard, Michael, Topcu, Ufuk
Format	Journal Article
Language	English
Published	New York IEEE 01.09.2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Approximation algorithms Costs Criteria Discounting Integer programming Linear programming Markov decision processes Markov decision processes (MDPs) Markov processes Mixed integer optimization Planning Probabilistic logic reachability Reagents Synthesis Task analysis Trajectory Transient performance
Online Access	Get full text
ISSN	0018-9286 1558-2523
DOI	10.1109/TAC.2024.3384834

Cover

Loading…

Abstract	In this article, we study the synthesisof a policy in a Markov decision process (MDP) following which an agent reaches a target state in the MDP while minimizing its total discounted cost. The problem combines a reachability criterion with a discounted cost criterion and naturally expresses the completion of a task with probabilistic guarantees and optimal transient performance. We first establish that an optimal policy for the considered formulation may not exist but that there always exists a near-optimal stationary policy. We additionally provide a necessary and sufficient condition for the existence of an optimal policy. We then restrict our attention to stationary deterministic policies and show that the decision problem associated with the synthesis of an optimal stationary deterministic policy is NP-complete. Finally, we provide an exact algorithm based on mixed-integer linear programming and propose an efficient approximation algorithm based on linear programming for the synthesis of an optimal stationary deterministic policy.
AbstractList	In this article, we study the synthesisof a policy in a Markov decision process (MDP) following which an agent reaches a target state in the MDP while minimizing its total discounted cost. The problem combines a reachability criterion with a discounted cost criterion and naturally expresses the completion of a task with probabilistic guarantees and optimal transient performance. We first establish that an optimal policy for the considered formulation may not exist but that there always exists a near-optimal stationary policy. We additionally provide a necessary and sufficient condition for the existence of an optimal policy. We then restrict our attention to stationary deterministic policies and show that the decision problem associated with the synthesis of an optimal stationary deterministic policy is NP-complete. Finally, we provide an exact algorithm based on mixed-integer linear programming and propose an efficient approximation algorithm based on linear programming for the synthesis of an optimal stationary deterministic policy.
Author	Savas, Yagiz Topcu, Ufuk Hibbard, Michael Verginis, Christos K.
Author_xml	– sequence: 1 givenname: Yagiz orcidid: 0000-0003-2976-0786 surname: Savas fullname: Savas, Yagiz email: yagiz.savas@utexas.edu organization: University of Texas at Austin, Austin, TX, USA – sequence: 2 givenname: Christos K. orcidid: 0000-0002-4289-2866 surname: Verginis fullname: Verginis, Christos K. email: christos.verginis@austin.utexas.edu organization: University of Texas at Austin, Austin, TX, USA – sequence: 3 givenname: Michael orcidid: 0000-0002-4697-4551 surname: Hibbard fullname: Hibbard, Michael email: mhibbard@utexas.edu organization: University of Texas at Austin, Austin, TX, USA – sequence: 4 givenname: Ufuk orcidid: 0000-0003-0819-9985 surname: Topcu fullname: Topcu, Ufuk email: utopcu@utexas.edu organization: University of Texas at Austin, Austin, TX, USA
BookMark	eNpNkE1PAjEQhhuDiYDePXho4nlx-rXbHgn4lWgwiuemW7paAl3cdg_46y2Bg6fJJM_7zuQZoUFog0PomsCEEFB3y-lsQoHyCWOSS8bP0JAIIQsqKBugIQCRhaKyvECjGNd5LTknQ7RYBPzqg9_6Xx--8LJNZoPnPtq2D8mt8KyNCfvMzN8i_ujrtbMJpxa_O2O_Te03Pu0zFGLqjA8pXqLzxmyiuzrNMfp8uF_OnoqXxePzbPpSWMpFKgQYagiIuuHAQdhVTY0ksiTCSUmrSjlKKmAmM0qyVWlV3QCzQPPXVcmAjdHtsXfXtT-9i0mv274L-aRmoCpFKWUkU3CkbNfG2LlG7zq_Nd1eE9AHbTpr0wdt-qQtR26OEe-c-4dzBYQL9gdSK2hh
CODEN	IETAA9
Cites_doi	10.1007/1-4020-8066-2_23 10.1137/1023004 10.1109/CDC40024.2019.9029287 10.1145/230514.571645 10.1007/978-3-540-71209-1_6 10.1007/3-540-48320-9_7 10.1287/moor.16.3.580 10.1287/moor.20.2.302 10.1145/3232848 10.1109/TAC.2004.826725 10.1007/BF01386390 10.1145/1390156.1390162 10.1007/11672142_26 10.23919/ACC50511.2021.9482749 10.1145/3424305 10.1109/TCST.2010.2103379 10.1109/TAC.2014.2298143 10.1007/978-3-030-45190-5_19 10.1002/nav.21743 10.1609/aaai.v26i1.8367 10.1287/moor.25.1.130.15210 10.1109/9.751365
ContentType	Journal Article
Copyright	Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
Copyright_xml	– notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
DBID	97E RIA RIE AAYXX CITATION 7SC 7SP 7TB 8FD FR3 JQ2 L7M L~C L~D
DOI	10.1109/TAC.2024.3384834
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Mechanical & Transportation Engineering Abstracts Technology Research Database Engineering Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional
DatabaseTitle	CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Mechanical & Transportation Engineering Abstracts Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Engineering Research Database Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional
DatabaseTitleList	Technology Research Database
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISSN	1558-2523
EndPage	6473
ExternalDocumentID	10_1109_TAC_2024_3384834 10490145
Genre	orig-research
GrantInformation_xml	– fundername: ARL grantid: W911NF-17-2-0181 – fundername: DARPA grantid: D19AP00004 – fundername: AFRL grantid: FA9550-19-1-0169
GroupedDBID	-~X .DC 0R~ 29I 3EH 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK ACNCT AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD F5P HZ~ H~9 IAAWW IBMZZ ICLAB IDIHD IFIPE IFJZH IPLJI JAVBF LAI M43 MS~ O9- OCL P2P RIA RIE RNS TAE TN5 VH1 VJK ~02 AAYOK AAYXX CITATION RIG 7SC 7SP 7TB 8FD FR3 JQ2 L7M L~C L~D
ID	FETCH-LOGICAL-c245t-50a2a105bf40405cdb2a818615e882779e21703aa10983d6c9bf03c0264476303
IEDL.DBID	RIE
ISSN	0018-9286
IngestDate	Mon Jun 30 10:16:08 EDT 2025 Tue Jul 01 03:36:49 EDT 2025 Wed Aug 27 02:03:43 EDT 2025
IsPeerReviewed	true
IsScholarly	true
Issue	9
Language	English
License	https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c245t-50a2a105bf40405cdb2a818615e882779e21703aa10983d6c9bf03c0264476303
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ORCID	0000-0002-4697-4551 0000-0003-2976-0786 0000-0003-0819-9985 0000-0002-4289-2866
PQID	3097922231
PQPubID	85475
PageCount	8
ParticipantIDs	crossref_primary_10_1109_TAC_2024_3384834 proquest_journals_3097922231 ieee_primary_10490145
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2024-09-01
PublicationDateYYYYMMDD	2024-09-01
PublicationDate_xml	– month: 09 year: 2024 text: 2024-09-01 day: 01
PublicationDecade	2020
PublicationPlace	New York
PublicationPlace_xml	– name: New York
PublicationTitle	IEEE transactions on automatic control
PublicationTitleAbbrev	TAC
PublicationYear	2024
Publisher	IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml	– name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References	ref13 ref12 ref15 ref14 Yang (ref20) 2019 ref31 ref30 ref10 ref2 ref17 ref18 Bertsekas (ref16) 1996 Altman (ref6) 1999 Paruchuri (ref27) 2004 ref23 ref26 Baier (ref25) 2008 ref22 (ref24) 2021 ref21 Kiennert (ref7) 2018; 51 ref28 ref29 ref8 ref9 Gbor (ref19) 1998 ref4 ref3 ref5 Dolgov (ref11) 2005 Puterman (ref1) 2014
References_xml	– ident: ref4 doi: 10.1007/1-4020-8066-2_23 – volume-title: Principles of Model Checking year: 2008 ident: ref25 – volume-title: Constrained Markov Decision Processes year: 1999 ident: ref6 – ident: ref26 doi: 10.1137/1023004 – ident: ref30 doi: 10.1109/CDC40024.2019.9029287 – volume-title: Neuro-Dynamic Programming year: 1996 ident: ref16 – year: 2021 ident: ref24 article-title: Gurobi optimizer reference manual – ident: ref28 doi: 10.1145/230514.571645 – start-page: 1326 volume-title: Proc. Int. Joint Conf. Artif. Intell. year: 2005 ident: ref11 article-title: Stationary deterministic policies for constrained MDPs with multiple rewards, costs, and discount factors – start-page: 596 volume-title: Proc. Int. Joint Conf. Auton. Agents Multiagent Syst. year: 2004 ident: ref27 article-title: Towards a formalization of teamwork with resource constraints – ident: ref9 doi: 10.1007/978-3-540-71209-1_6 – ident: ref2 doi: 10.1007/3-540-48320-9_7 – start-page: 197 volume-title: Proc. Int. Conf. Mach. Learn. year: 1998 ident: ref19 article-title: Multi-criteria reinforcement learning – ident: ref8 doi: 10.1287/moor.16.3.580 – ident: ref22 doi: 10.1287/moor.20.2.302 – volume: 51 start-page: 1 issue: 5 year: 2018 ident: ref7 article-title: A survey on game-theoretic approaches for intrusion detection and response optimization publication-title: ACM Comput. Surv. doi: 10.1145/3232848 – ident: ref14 doi: 10.1109/TAC.2004.826725 – ident: ref29 doi: 10.1007/BF01386390 – ident: ref21 doi: 10.1145/1390156.1390162 – ident: ref10 doi: 10.1007/11672142_26 – volume-title: Markov Decision Processes: Discrete Stochastic Dynamic Programming year: 2014 ident: ref1 – ident: ref23 doi: 10.23919/ACC50511.2021.9482749 – ident: ref31 doi: 10.1145/3424305 – ident: ref5 doi: 10.1109/TCST.2010.2103379 – ident: ref3 doi: 10.1109/TAC.2014.2298143 – ident: ref18 doi: 10.1007/978-3-030-45190-5_19 – start-page: 14636 volume-title: Proc. Adv. Neural Inf. Process. Syst. year: 2019 ident: ref20 article-title: A generalized algorithm for multi-objective reinforcement learning and policy adaptation – ident: ref15 doi: 10.1002/nav.21743 – ident: ref17 doi: 10.1609/aaai.v26i1.8367 – ident: ref13 doi: 10.1287/moor.25.1.130.15210 – ident: ref12 doi: 10.1109/9.751365
SSID	ssj0016441
Score	2.4567184
Snippet	In this article, we study the synthesisof a policy in a Markov decision process (MDP) following which an agent reaches a target state in the MDP while...
SourceID	proquest crossref ieee
SourceType	Aggregation Database Index Database Publisher
StartPage	6466
SubjectTerms	Algorithms Approximation algorithms Costs Criteria Discounting Integer programming Linear programming Markov decision processes Markov decision processes (MDPs) Markov processes Mixed integer optimization Planning Probabilistic logic reachability Reagents Synthesis Task analysis Trajectory Transient performance
Title	On Minimizing Total Discounted Cost in MDPs Subject to Reachability Constraints
URI	https://ieeexplore.ieee.org/document/10490145 https://www.proquest.com/docview/3097922231
Volume	69
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFA9uJz34OXE6JQcvHlqzJv06jk0ZwjaRDXYraZNCEVux3cH99b6XdmMqgrcW0hLy8l5-v7wvQm4FV8LlyrfShDMLAbsV-5JbInZ0KlU_ZRoThSdTb7wQT0t32SSrm1wYrbUJPtM2PhpfviqSFV6VgYYL9Pq5LdIC5lYna21dBniw12YXNNgJtj5JFt7PB0Nggo6wgY_h5dm3M8g0Vfllic3x8nhEppuJ1VElr_aqiu1k_aNm479nfkwOG6BJB_XOOCF7Oj8lBzvlB8_IbJbTSZZnb9ka3um8ACBOR1lp-kdoRYdFWdEMxoyeSwoWBq9saFXQF4zArOt7f1Ls-Gn6TFRlhyweH-bDsdU0WLASR7iV5TLpSABYcSpAl91ExY7ECnd9VwPw9v1QA2FhXMKYMODKS8I4ZTxhCKLALjF-Ttp5kesLQj0PuJvnSiBIWnBPBY4KtUwD1Y-xt0fSJXebJY_e6zoakeEfLIxAPBGKJ2rE0yUdXMGdcfXidUlvI6So0bQy4iz0QwQ5_cs_Prsi-_j3OjCsR9rVx0pfA5Ko4huzg74A-lHCcw
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8QwEB58HNSDb3F95uDFQ9dsk7abo6zKqvsQWcFbaZsUitiK7R701zuTdsUHgrcWpjRkMjPfZF4AJ1Jo6QkdOGkiuEOA3YmDSDgydk0a6U7KDRUKD0d-_0HePHqPTbG6rYUxxtjkM9OmRxvL10UypasylHBJUT9vHhbR8EtVl2t9Bg3ItNeKF2XY7X5GJbk6m5z30Bd0ZRs9Mro--2aF7FiVX7rYGpirNRjNllbnlTy1p1XcTt5_dG3899rXYbWBmuy8PhsbMGfyTVj50oBwC8bjnA2zPHvO3vGdTQqE4uwiK-0ECaNZrygrliHNxV3JUMfQpQ2rCnZPOZh1h-83RjM_7aSJqtyGh6vLSa_vNCMWnMSVXuV4PHIjhFhxKlGavUTHbkQ97jqeQegdBMqgy8JFhDSqK7SfqDjlIuEEo1AzcbEDC3mRm11gvo_em-9F6CIZKXzddbUyUdrVnZimeyQtOJ1tefhSd9IIrQfCVYjsCYk9YcOeFmzTDn6hqzevBQczJoWNrJWh4CpQBHM6e398dgxL_clwEA6uR7f7sEx_qtPEDmChep2aQ8QVVXxkT9MHYUjFww
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=On+Minimizing+Total+Discounted+Cost+in+MDPs+Subject+to+Reachability+Constraints&rft.jtitle=IEEE+transactions+on+automatic+control&rft.au=Savas%2C+Yagiz&rft.au=Verginis%2C+Christos+K.&rft.au=Hibbard%2C+Michael&rft.au=Topcu%2C+Ufuk&rft.date=2024-09-01&rft.pub=IEEE&rft.issn=0018-9286&rft.volume=69&rft.issue=9&rft.spage=6466&rft.epage=6473&rft_id=info:doi/10.1109%2FTAC.2024.3384834&rft.externalDocID=10490145
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0018-9286&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0018-9286&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0018-9286&client=summon