On Minimizing Total Discounted Cost in MDPs Subject to Reachability Constraints

In this article, we study the synthesisof a policy in a Markov decision process (MDP) following which an agent reaches a target state in the MDP while minimizing its total discounted cost. The problem combines a reachability criterion with a discounted cost criterion and naturally expresses the comp...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on automatic control Vol. 69; no. 9; pp. 6466 - 6473
Main Authors Savas, Yagiz, Verginis, Christos K., Hibbard, Michael, Topcu, Ufuk
Format Journal Article
LanguageEnglish
Published New York IEEE 01.09.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN0018-9286
1558-2523
DOI10.1109/TAC.2024.3384834

Cover

Loading…
Abstract In this article, we study the synthesisof a policy in a Markov decision process (MDP) following which an agent reaches a target state in the MDP while minimizing its total discounted cost. The problem combines a reachability criterion with a discounted cost criterion and naturally expresses the completion of a task with probabilistic guarantees and optimal transient performance. We first establish that an optimal policy for the considered formulation may not exist but that there always exists a near-optimal stationary policy. We additionally provide a necessary and sufficient condition for the existence of an optimal policy. We then restrict our attention to stationary deterministic policies and show that the decision problem associated with the synthesis of an optimal stationary deterministic policy is NP-complete. Finally, we provide an exact algorithm based on mixed-integer linear programming and propose an efficient approximation algorithm based on linear programming for the synthesis of an optimal stationary deterministic policy.
AbstractList In this article, we study the synthesisof a policy in a Markov decision process (MDP) following which an agent reaches a target state in the MDP while minimizing its total discounted cost. The problem combines a reachability criterion with a discounted cost criterion and naturally expresses the completion of a task with probabilistic guarantees and optimal transient performance. We first establish that an optimal policy for the considered formulation may not exist but that there always exists a near-optimal stationary policy. We additionally provide a necessary and sufficient condition for the existence of an optimal policy. We then restrict our attention to stationary deterministic policies and show that the decision problem associated with the synthesis of an optimal stationary deterministic policy is NP-complete. Finally, we provide an exact algorithm based on mixed-integer linear programming and propose an efficient approximation algorithm based on linear programming for the synthesis of an optimal stationary deterministic policy.
Author Savas, Yagiz
Topcu, Ufuk
Hibbard, Michael
Verginis, Christos K.
Author_xml – sequence: 1
  givenname: Yagiz
  orcidid: 0000-0003-2976-0786
  surname: Savas
  fullname: Savas, Yagiz
  email: yagiz.savas@utexas.edu
  organization: University of Texas at Austin, Austin, TX, USA
– sequence: 2
  givenname: Christos K.
  orcidid: 0000-0002-4289-2866
  surname: Verginis
  fullname: Verginis, Christos K.
  email: christos.verginis@austin.utexas.edu
  organization: University of Texas at Austin, Austin, TX, USA
– sequence: 3
  givenname: Michael
  orcidid: 0000-0002-4697-4551
  surname: Hibbard
  fullname: Hibbard, Michael
  email: mhibbard@utexas.edu
  organization: University of Texas at Austin, Austin, TX, USA
– sequence: 4
  givenname: Ufuk
  orcidid: 0000-0003-0819-9985
  surname: Topcu
  fullname: Topcu, Ufuk
  email: utopcu@utexas.edu
  organization: University of Texas at Austin, Austin, TX, USA
BookMark eNpNkE1PAjEQhhuDiYDePXho4nlx-rXbHgn4lWgwiuemW7paAl3cdg_46y2Bg6fJJM_7zuQZoUFog0PomsCEEFB3y-lsQoHyCWOSS8bP0JAIIQsqKBugIQCRhaKyvECjGNd5LTknQ7RYBPzqg9_6Xx--8LJNZoPnPtq2D8mt8KyNCfvMzN8i_ujrtbMJpxa_O2O_Te03Pu0zFGLqjA8pXqLzxmyiuzrNMfp8uF_OnoqXxePzbPpSWMpFKgQYagiIuuHAQdhVTY0ksiTCSUmrSjlKKmAmM0qyVWlV3QCzQPPXVcmAjdHtsXfXtT-9i0mv274L-aRmoCpFKWUkU3CkbNfG2LlG7zq_Nd1eE9AHbTpr0wdt-qQtR26OEe-c-4dzBYQL9gdSK2hh
CODEN IETAA9
Cites_doi 10.1007/1-4020-8066-2_23
10.1137/1023004
10.1109/CDC40024.2019.9029287
10.1145/230514.571645
10.1007/978-3-540-71209-1_6
10.1007/3-540-48320-9_7
10.1287/moor.16.3.580
10.1287/moor.20.2.302
10.1145/3232848
10.1109/TAC.2004.826725
10.1007/BF01386390
10.1145/1390156.1390162
10.1007/11672142_26
10.23919/ACC50511.2021.9482749
10.1145/3424305
10.1109/TCST.2010.2103379
10.1109/TAC.2014.2298143
10.1007/978-3-030-45190-5_19
10.1002/nav.21743
10.1609/aaai.v26i1.8367
10.1287/moor.25.1.130.15210
10.1109/9.751365
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
7TB
8FD
FR3
JQ2
L7M
L~C
L~D
DOI 10.1109/TAC.2024.3384834
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Mechanical & Transportation Engineering Abstracts
Technology Research Database
Engineering Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Mechanical & Transportation Engineering Abstracts
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Engineering Research Database
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Technology Research Database
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1558-2523
EndPage 6473
ExternalDocumentID 10_1109_TAC_2024_3384834
10490145
Genre orig-research
GrantInformation_xml – fundername: ARL
  grantid: W911NF-17-2-0181
– fundername: DARPA
  grantid: D19AP00004
– fundername: AFRL
  grantid: FA9550-19-1-0169
GroupedDBID -~X
.DC
0R~
29I
3EH
4.4
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACGFS
ACIWK
ACNCT
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
F5P
HZ~
H~9
IAAWW
IBMZZ
ICLAB
IDIHD
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
RIA
RIE
RNS
TAE
TN5
VH1
VJK
~02
AAYOK
AAYXX
CITATION
RIG
7SC
7SP
7TB
8FD
FR3
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c245t-50a2a105bf40405cdb2a818615e882779e21703aa10983d6c9bf03c0264476303
IEDL.DBID RIE
ISSN 0018-9286
IngestDate Mon Jun 30 10:16:08 EDT 2025
Tue Jul 01 03:36:49 EDT 2025
Wed Aug 27 02:03:43 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 9
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c245t-50a2a105bf40405cdb2a818615e882779e21703aa10983d6c9bf03c0264476303
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-4697-4551
0000-0003-2976-0786
0000-0003-0819-9985
0000-0002-4289-2866
PQID 3097922231
PQPubID 85475
PageCount 8
ParticipantIDs crossref_primary_10_1109_TAC_2024_3384834
proquest_journals_3097922231
ieee_primary_10490145
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2024-09-01
PublicationDateYYYYMMDD 2024-09-01
PublicationDate_xml – month: 09
  year: 2024
  text: 2024-09-01
  day: 01
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on automatic control
PublicationTitleAbbrev TAC
PublicationYear 2024
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref12
ref15
ref14
Yang (ref20) 2019
ref31
ref30
ref10
ref2
ref17
ref18
Bertsekas (ref16) 1996
Altman (ref6) 1999
Paruchuri (ref27) 2004
ref23
ref26
Baier (ref25) 2008
ref22
(ref24) 2021
ref21
Kiennert (ref7) 2018; 51
ref28
ref29
ref8
ref9
Gbor (ref19) 1998
ref4
ref3
ref5
Dolgov (ref11) 2005
Puterman (ref1) 2014
References_xml – ident: ref4
  doi: 10.1007/1-4020-8066-2_23
– volume-title: Principles of Model Checking
  year: 2008
  ident: ref25
– volume-title: Constrained Markov Decision Processes
  year: 1999
  ident: ref6
– ident: ref26
  doi: 10.1137/1023004
– ident: ref30
  doi: 10.1109/CDC40024.2019.9029287
– volume-title: Neuro-Dynamic Programming
  year: 1996
  ident: ref16
– year: 2021
  ident: ref24
  article-title: Gurobi optimizer reference manual
– ident: ref28
  doi: 10.1145/230514.571645
– start-page: 1326
  volume-title: Proc. Int. Joint Conf. Artif. Intell.
  year: 2005
  ident: ref11
  article-title: Stationary deterministic policies for constrained MDPs with multiple rewards, costs, and discount factors
– start-page: 596
  volume-title: Proc. Int. Joint Conf. Auton. Agents Multiagent Syst.
  year: 2004
  ident: ref27
  article-title: Towards a formalization of teamwork with resource constraints
– ident: ref9
  doi: 10.1007/978-3-540-71209-1_6
– ident: ref2
  doi: 10.1007/3-540-48320-9_7
– start-page: 197
  volume-title: Proc. Int. Conf. Mach. Learn.
  year: 1998
  ident: ref19
  article-title: Multi-criteria reinforcement learning
– ident: ref8
  doi: 10.1287/moor.16.3.580
– ident: ref22
  doi: 10.1287/moor.20.2.302
– volume: 51
  start-page: 1
  issue: 5
  year: 2018
  ident: ref7
  article-title: A survey on game-theoretic approaches for intrusion detection and response optimization
  publication-title: ACM Comput. Surv.
  doi: 10.1145/3232848
– ident: ref14
  doi: 10.1109/TAC.2004.826725
– ident: ref29
  doi: 10.1007/BF01386390
– ident: ref21
  doi: 10.1145/1390156.1390162
– ident: ref10
  doi: 10.1007/11672142_26
– volume-title: Markov Decision Processes: Discrete Stochastic Dynamic Programming
  year: 2014
  ident: ref1
– ident: ref23
  doi: 10.23919/ACC50511.2021.9482749
– ident: ref31
  doi: 10.1145/3424305
– ident: ref5
  doi: 10.1109/TCST.2010.2103379
– ident: ref3
  doi: 10.1109/TAC.2014.2298143
– ident: ref18
  doi: 10.1007/978-3-030-45190-5_19
– start-page: 14636
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  year: 2019
  ident: ref20
  article-title: A generalized algorithm for multi-objective reinforcement learning and policy adaptation
– ident: ref15
  doi: 10.1002/nav.21743
– ident: ref17
  doi: 10.1609/aaai.v26i1.8367
– ident: ref13
  doi: 10.1287/moor.25.1.130.15210
– ident: ref12
  doi: 10.1109/9.751365
SSID ssj0016441
Score 2.4567184
Snippet In this article, we study the synthesisof a policy in a Markov decision process (MDP) following which an agent reaches a target state in the MDP while...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Index Database
Publisher
StartPage 6466
SubjectTerms Algorithms
Approximation algorithms
Costs
Criteria
Discounting
Integer programming
Linear programming
Markov decision processes
Markov decision processes (MDPs)
Markov processes
Mixed integer
optimization
Planning
Probabilistic logic
reachability
Reagents
Synthesis
Task analysis
Trajectory
Transient performance
Title On Minimizing Total Discounted Cost in MDPs Subject to Reachability Constraints
URI https://ieeexplore.ieee.org/document/10490145
https://www.proquest.com/docview/3097922231
Volume 69
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFA9uJz34OXE6JQcvHlqzJv06jk0ZwjaRDXYraZNCEVux3cH99b6XdmMqgrcW0hLy8l5-v7wvQm4FV8LlyrfShDMLAbsV-5JbInZ0KlU_ZRoThSdTb7wQT0t32SSrm1wYrbUJPtM2PhpfviqSFV6VgYYL9Pq5LdIC5lYna21dBniw12YXNNgJtj5JFt7PB0Nggo6wgY_h5dm3M8g0Vfllic3x8nhEppuJ1VElr_aqiu1k_aNm479nfkwOG6BJB_XOOCF7Oj8lBzvlB8_IbJbTSZZnb9ka3um8ACBOR1lp-kdoRYdFWdEMxoyeSwoWBq9saFXQF4zArOt7f1Ls-Gn6TFRlhyweH-bDsdU0WLASR7iV5TLpSABYcSpAl91ExY7ECnd9VwPw9v1QA2FhXMKYMODKS8I4ZTxhCKLALjF-Ttp5kesLQj0PuJvnSiBIWnBPBY4KtUwD1Y-xt0fSJXebJY_e6zoakeEfLIxAPBGKJ2rE0yUdXMGdcfXidUlvI6So0bQy4iz0QwQ5_cs_Prsi-_j3OjCsR9rVx0pfA5Ko4huzg74A-lHCcw
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8QwEB58HNSDb3F95uDFQ9dsk7abo6zKqvsQWcFbaZsUitiK7R701zuTdsUHgrcWpjRkMjPfZF4AJ1Jo6QkdOGkiuEOA3YmDSDgydk0a6U7KDRUKD0d-_0HePHqPTbG6rYUxxtjkM9OmRxvL10UypasylHBJUT9vHhbR8EtVl2t9Bg3ItNeKF2XY7X5GJbk6m5z30Bd0ZRs9Mro--2aF7FiVX7rYGpirNRjNllbnlTy1p1XcTt5_dG3899rXYbWBmuy8PhsbMGfyTVj50oBwC8bjnA2zPHvO3vGdTQqE4uwiK-0ECaNZrygrliHNxV3JUMfQpQ2rCnZPOZh1h-83RjM_7aSJqtyGh6vLSa_vNCMWnMSVXuV4PHIjhFhxKlGavUTHbkQ97jqeQegdBMqgy8JFhDSqK7SfqDjlIuEEo1AzcbEDC3mRm11gvo_em-9F6CIZKXzddbUyUdrVnZimeyQtOJ1tefhSd9IIrQfCVYjsCYk9YcOeFmzTDn6hqzevBQczJoWNrJWh4CpQBHM6e398dgxL_clwEA6uR7f7sEx_qtPEDmChep2aQ8QVVXxkT9MHYUjFww
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=On+Minimizing+Total+Discounted+Cost+in+MDPs+Subject+to+Reachability+Constraints&rft.jtitle=IEEE+transactions+on+automatic+control&rft.au=Savas%2C+Yagiz&rft.au=Verginis%2C+Christos+K.&rft.au=Hibbard%2C+Michael&rft.au=Topcu%2C+Ufuk&rft.date=2024-09-01&rft.pub=IEEE&rft.issn=0018-9286&rft.volume=69&rft.issue=9&rft.spage=6466&rft.epage=6473&rft_id=info:doi/10.1109%2FTAC.2024.3384834&rft.externalDocID=10490145
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0018-9286&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0018-9286&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0018-9286&client=summon