SEGAC: Sample Efficient Generalized Actor Critic for the Stochastic On-Time Arrival Problem

This paper studies the stochastic on-time arrival (SOTA) problem in transportation networks and introduces a novel reinforcement learning-based algorithm, namely sample efficient generalized actor critic (SEGAC). Different from almost all canonical SOTA solutions, which are usually computationally e...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on intelligent transportation systems Vol. 25; no. 8; pp. 10190 - 10205
Main Authors Guo, Hongliang, He, Zhi, Sheng, Wenda, Cao, Zhiguang, Zhou, Yingjie, Gao, Weinan
Format Journal Article
LanguageEnglish
Published IEEE 01.08.2024
Subjects
Online AccessGet full text

Cover

Loading…
Abstract This paper studies the stochastic on-time arrival (SOTA) problem in transportation networks and introduces a novel reinforcement learning-based algorithm, namely sample efficient generalized actor critic (SEGAC). Different from almost all canonical SOTA solutions, which are usually computationally expensive and lack generalizability to unforeseen destination nodes, SEGAC offers the following appealing characteristics. SEGAC updates the ego vehicle's navigation policy in a sample efficient manner, reduces the variance of both value network and policy network during training, and is automatically adaptive to new destinations. Furthermore, the pre-trained SEGAC policy network enables its real-time decision-making ability within seconds, outperforming state-of-the-art SOTA algorithms in simulations across various transportation networks. We also successfully deploy SEGAC to two real metropolitan transportation networks, namely Chengdu and Beijing, using real traffic data, with satisfying results.
AbstractList This paper studies the stochastic on-time arrival (SOTA) problem in transportation networks and introduces a novel reinforcement learning-based algorithm, namely sample efficient generalized actor critic (SEGAC). Different from almost all canonical SOTA solutions, which are usually computationally expensive and lack generalizability to unforeseen destination nodes, SEGAC offers the following appealing characteristics. SEGAC updates the ego vehicle's navigation policy in a sample efficient manner, reduces the variance of both value network and policy network during training, and is automatically adaptive to new destinations. Furthermore, the pre-trained SEGAC policy network enables its real-time decision-making ability within seconds, outperforming state-of-the-art SOTA algorithms in simulations across various transportation networks. We also successfully deploy SEGAC to two real metropolitan transportation networks, namely Chengdu and Beijing, using real traffic data, with satisfying results.
Author Zhou, Yingjie
Sheng, Wenda
Cao, Zhiguang
He, Zhi
Guo, Hongliang
Gao, Weinan
Author_xml – sequence: 1
  givenname: Hongliang
  orcidid: 0000-0002-9836-3090
  surname: Guo
  fullname: Guo, Hongliang
  organization: College of Computer Science, Sichuan University (SCU), Chengdu, China
– sequence: 2
  givenname: Zhi
  orcidid: 0000-0002-2805-2975
  surname: He
  fullname: He, Zhi
  organization: School of Automation Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu, China
– sequence: 3
  givenname: Wenda
  orcidid: 0000-0003-0132-3656
  surname: Sheng
  fullname: Sheng, Wenda
  organization: School of Automation Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu, China
– sequence: 4
  givenname: Zhiguang
  orcidid: 0000-0002-4499-759X
  surname: Cao
  fullname: Cao, Zhiguang
  email: zhiguangcao@outlook.com
  organization: School of Computing and Information Systems, Singapore Management University, Bras Basah, Singapore
– sequence: 5
  givenname: Yingjie
  orcidid: 0000-0002-1129-0213
  surname: Zhou
  fullname: Zhou, Yingjie
  email: yjzhou09@gmail.com
  organization: College of Computer Science, Sichuan University (SCU), Chengdu, China
– sequence: 6
  givenname: Weinan
  orcidid: 0000-0001-7921-018X
  surname: Gao
  fullname: Gao, Weinan
  organization: State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang, China
BookMark eNpNkE1Lw0AYhBepYFv9AYKH_QOp--5HkvUWSq2FQoXEk4ew2bxLV_JRNkHQX29De_A0wzAzh2dBZl3fISGPwFYATD8XuyJfccblSogYpFQ3ZA5KpRFjEM8mz2WkmWJ3ZDEMX-dUKoA5-cw322z9QnPTnhqkG-e89diNdIsdBtP4X6xpZsc-0HXwo7fUne14RJqPvT2aYYoOXVT4FmkWgv82DX0PfdVge09unWkGfLjqkny8bor1W7Q_bHfrbB9ZDukY6Vjouq5T4XTCoRKSx5WrNWJqNBfKGIDUcG7i2CYJ4xViwqyTSlXaGicTsSRw-bWhH4aArjwF35rwUwIrJzrlRKec6JRXOufN02XjEfFfXwqhIRZ_WTpifw
CODEN ITISFG
Cites_doi 10.1007/978-1-4419-0820-9_9
10.1016/j.trb.2018.12.011
10.1109/TITS.2022.3207011
10.1007/s10957-005-7498-5
10.1016/j.trb.2015.04.002
10.1109/TITS.2021.3105415
10.5220/0010267009470956
10.1109/TCYB.2021.3111082
10.1007/BF00992698
10.1109/TVT.2015.2480964
10.1007/978-3-319-38851-9_16
10.1038/s41597-019-0060-3
10.1088/1742-6596/1368/3/032008
10.1016/j.trb.2020.05.013
10.1061/9780784479896.007
10.1109/TNNLS.2021.3068828
10.15607/RSS.2012.VIII.032
10.1016/j.ejor.2018.10.053
10.1007/s40819-022-01351-z
10.1109/ITSC45102.2020.9294650
10.1609/aaai.v36i10.21340
10.1016/j.trb.2018.11.013
10.1137/1.9781611973198.13
10.1109/TVT.2021.3109169
10.1109/MITS.2023.3265309
10.1016/j.trb.2013.10.011
10.1287/opre.2020.2089
10.1155/2017/4586471
10.1016/j.trb.2016.11.012
10.1109/TVT.2020.2964784
10.1080/15472450.2013.806851
10.1145/1008328.1008329
10.1080/21680566.2016.1169953
10.1287/opre.2017.1662
10.1109/MITS.2018.2880260
10.1109/TNNLS.2021.3069728
10.1109/TITS.2021.3096829
10.1109/TITS.2015.2498160
10.1109/TITS.2022.3189865
10.1016/j.trc.2022.103866
ContentType Journal Article
DBID 97E
RIA
RIE
AAYXX
CITATION
DOI 10.1109/TITS.2024.3361445
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005-present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Xplore
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1558-0016
EndPage 10205
ExternalDocumentID 10_1109_TITS_2024_3361445
10433916
Genre orig-research
GrantInformation_xml – fundername: Sichuan Science and Technology Program
  grantid: 2023NSFSC1965
  funderid: 10.13039/100012542
– fundername: Higher Education Discipline Innovation Project; 111 Project
  grantid: B21044
  funderid: 10.13039/501100013314
GroupedDBID -~X
0R~
29I
4.4
5GY
5VS
6IK
97E
AAJGR
AASAJ
ABQJQ
ABTAH
ACGFO
ACGFS
ACIWK
ACNCT
AENEX
AETIX
AIBXA
AKJIK
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
HZ~
H~9
IFIPE
IPLJI
JAVBF
LAI
M43
O9-
OCL
P2P
PQQKQ
RIA
RIE
RIG
RNS
ZY4
AAYXX
CITATION
ID FETCH-LOGICAL-c218t-9639ddd83f9721b3426bfd9ee8a9235aa118a22a66c7702bee70cf455b9caf473
IEDL.DBID RIE
ISSN 1524-9050
IngestDate Wed Aug 07 14:11:22 EDT 2024
Wed Aug 14 05:40:28 EDT 2024
IsPeerReviewed true
IsScholarly true
Issue 8
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c218t-9639ddd83f9721b3426bfd9ee8a9235aa118a22a66c7702bee70cf455b9caf473
ORCID 0000-0002-2805-2975
0000-0002-1129-0213
0000-0003-0132-3656
0000-0002-4499-759X
0000-0001-7921-018X
0000-0002-9836-3090
PageCount 16
ParticipantIDs ieee_primary_10433916
crossref_primary_10_1109_TITS_2024_3361445
PublicationCentury 2000
PublicationDate 2024-Aug.
PublicationDateYYYYMMDD 2024-08-01
PublicationDate_xml – month: 08
  year: 2024
  text: 2024-Aug.
PublicationDecade 2020
PublicationTitle IEEE transactions on intelligent transportation systems
PublicationTitleAbbrev TITS
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
References ref13
ref12
ref34
ref15
ref14
ref36
ref31
ref30
ref11
ref33
ref10
Sutton (ref38) 2018
ref2
ref1
ref17
ref39
ref16
ref19
ref18
Kwon (ref35); 33
Andonov (ref37) 2018
ref24
ref46
ref23
ref45
ref26
ref25
ref20
ref42
ref41
ref22
ref44
ref21
Bar-Gera (ref43) 2023
ref28
ref27
ref29
ref8
ref7
Kool (ref32)
ref9
ref4
ref3
ref6
Konda (ref40); 12
ref5
References_xml – ident: ref17
  doi: 10.1007/978-1-4419-0820-9_9
– ident: ref44
  doi: 10.1016/j.trb.2018.12.011
– ident: ref36
  doi: 10.1109/TITS.2022.3207011
– volume-title: Reinforcement Learning: An Introduction
  year: 2018
  ident: ref38
  contributor:
    fullname: Sutton
– ident: ref4
  doi: 10.1007/s10957-005-7498-5
– ident: ref5
  doi: 10.1016/j.trb.2015.04.002
– year: 2018
  ident: ref37
  article-title: A new formulation of the shortest path problem with on-time arrival reliability
  publication-title: arXiv:1804.07829
  contributor:
    fullname: Andonov
– ident: ref1
  doi: 10.1109/TITS.2021.3105415
– ident: ref30
  doi: 10.5220/0010267009470956
– volume-title: Transportation Network Test Problems
  year: 2023
  ident: ref43
  contributor:
    fullname: Bar-Gera
– ident: ref33
  doi: 10.1109/TCYB.2021.3111082
– ident: ref39
  doi: 10.1007/BF00992698
– ident: ref22
  doi: 10.1109/TVT.2015.2480964
– ident: ref12
  doi: 10.1007/978-3-319-38851-9_16
– ident: ref45
  doi: 10.1038/s41597-019-0060-3
– ident: ref10
  doi: 10.1088/1742-6596/1368/3/032008
– ident: ref18
  doi: 10.1016/j.trb.2020.05.013
– ident: ref25
  doi: 10.1061/9780784479896.007
– ident: ref34
  doi: 10.1109/TNNLS.2021.3068828
– ident: ref16
  doi: 10.15607/RSS.2012.VIII.032
– ident: ref14
  doi: 10.1016/j.ejor.2018.10.053
– ident: ref6
  doi: 10.1007/s40819-022-01351-z
– ident: ref7
  doi: 10.1109/ITSC45102.2020.9294650
– ident: ref41
  doi: 10.1609/aaai.v36i10.21340
– ident: ref3
  doi: 10.1016/j.trb.2018.11.013
– ident: ref20
  doi: 10.1137/1.9781611973198.13
– ident: ref15
  doi: 10.1109/TVT.2021.3109169
– ident: ref31
  doi: 10.1109/MITS.2023.3265309
– ident: ref27
  doi: 10.1016/j.trb.2013.10.011
– ident: ref29
  doi: 10.1287/opre.2020.2089
– ident: ref13
  doi: 10.1155/2017/4586471
– ident: ref23
  doi: 10.1016/j.trb.2016.11.012
– ident: ref24
  doi: 10.1109/TVT.2020.2964784
– ident: ref28
  doi: 10.1080/15472450.2013.806851
– ident: ref42
  doi: 10.1145/1008328.1008329
– start-page: 1
  volume-title: Proc. Int. Conf. Learn. Represent.
  ident: ref32
  article-title: Attention, learn to solve routing problems!
  contributor:
    fullname: Kool
– volume: 12
  start-page: 1008
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  ident: ref40
  article-title: Actor-critic algorithms
  contributor:
    fullname: Konda
– ident: ref19
  doi: 10.1080/21680566.2016.1169953
– ident: ref11
  doi: 10.1287/opre.2017.1662
– ident: ref26
  doi: 10.1109/MITS.2018.2880260
– ident: ref46
  doi: 10.1109/TNNLS.2021.3069728
– ident: ref9
  doi: 10.1109/TITS.2021.3096829
– ident: ref21
  doi: 10.1109/TITS.2015.2498160
– ident: ref2
  doi: 10.1109/TITS.2022.3189865
– volume: 33
  start-page: 21188
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  ident: ref35
  article-title: POMO: Policy optimization with multiple optima for reinforcement learning
  contributor:
    fullname: Kwon
– ident: ref8
  doi: 10.1016/j.trc.2022.103866
SSID ssj0014511
Score 2.4599934
Snippet This paper studies the stochastic on-time arrival (SOTA) problem in transportation networks and introduces a novel reinforcement learning-based algorithm,...
SourceID crossref
ieee
SourceType Aggregation Database
Publisher
StartPage 10190
SubjectTerms Gaussian distribution
Generalized actor critic
Navigation
Optimization
Real-time systems
Reliability
Routing
sample efficiency
stochastic on-time arrival (SOTA)
Transportation
variance reduction
Title SEGAC: Sample Efficient Generalized Actor Critic for the Stochastic On-Time Arrival Problem
URI https://ieeexplore.ieee.org/document/10433916
Volume 25
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3dS8MwEA9uT_rg58T5RR58ElKzJlkb38bYnIJT6AYDH0qaXnAInYzuZX-9SdrJFATfSulHuLvk7nL5_Q6hG87smpdHhgDLOOEdoERJ0CRWRnWMDpnoOOzw87g7mvKnmZjVYHWPhQEAf_gMAnfpa_n5Qq_cVpmd4Zw5oGgDNWIaVmCt75KBI9ry5KghJ5KKTQmzQ-Xd5HGS2FQw5AFjLgESP5zQVlcV71SGB2i8GU51luQjWJVZoNe_mBr_Pd5DtF-Hl7hX2cMR2oHiGO1tkQ6eoLdk8NDr3-NEOWJgPPAcEvY7uGagnq8hxz23l4-rPgjYxrXYxok4KRf6XTliZ_xSEAcesT9azq2t4teqL00LTYeDSX9E6hYLRFvfXhI7_WSe5zEzjsUnY9ZfZyaXALGykZ9QyuYfKgxVt6ujiIYZQES14UJkUivDI3aKmsWigDOEpYmpCiEC5kt7dvXSOucyE8DAGG3a6HYj8_SzYtJIfQZCZeoUlDoFpbWC2qjlxLn1YCXJ8z_uX6Bd93p1Mu8SNcvlCq5stFBm195KvgDPTbuJ
link.rule.ids 315,786,790,802,27957,27958,55109
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3dS8MwEA86H9QHPyfOzzz4JGR2Tbouvo0x3XSbQjcY-FDS9IJD2GR0L_vrvaSdTEHwrZSQhLtL7i539ztCbgTHOy8NDQOeCCZq4DElQbOGMqpmtM-Dmq0d7g_qnZF4Ggfjoljd1cIAgEs-g6r9dLH8dKYX9qkMT7jgtlB0k2yhovdkXq71HTSwUFsOHtUXTHrBKoiJA--G3WGEzqAvqpxbFyj4oYbW-qo4tfKwTwarDeXZJB_VRZZU9fIXVuO_d3xA9goDkzZziTgkGzA9IrtrsIPH5C1qPzZb9zRSFhqYth2KBM5DCwzqyRJS2rSv-TTvhEDRsqVoKdIom-l3ZaGd6cuU2fIRXGg-QWmlr3lnmjIZPbSHrQ4rmiwwjdo9Y3gAZZqmDW4sjk_CUWMnJpUADYW2X6AUeiDK91W9rsPQ8xOA0NNGBEEitTIi5CekNJ1N4ZRQaRqe8iEE7oJ7eH9pnQqZBMDBGG0q5HZF8_gzx9KInQ_iydgyKLYMigsGVUjZknNtYE7Jsz_-X5PtzrDfi3vdwfM52bFT5Xl6F6SUzRdwibZDllw5ifkC5YK-3w
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SEGAC%3A+Sample+Efficient+Generalized+Actor+Critic+for+the+Stochastic+On-Time+Arrival+Problem&rft.jtitle=IEEE+transactions+on+intelligent+transportation+systems&rft.au=Guo%2C+Hongliang&rft.au=He%2C+Zhi&rft.au=Sheng%2C+Wenda&rft.au=Cao%2C+Zhiguang&rft.date=2024-08-01&rft.pub=IEEE&rft.issn=1524-9050&rft.volume=25&rft.issue=8&rft.spage=10190&rft.epage=10205&rft_id=info:doi/10.1109%2FTITS.2024.3361445&rft.externalDocID=10433916
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1524-9050&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1524-9050&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1524-9050&client=summon