SEGAC: Sample Efficient Generalized Actor Critic for the Stochastic On-Time Arrival Problem

This paper studies the stochastic on-time arrival (SOTA) problem in transportation networks and introduces a novel reinforcement learning-based algorithm, namely sample efficient generalized actor critic (SEGAC). Different from almost all canonical SOTA solutions, which are usually computationally e...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on intelligent transportation systems Vol. 25; no. 8; pp. 10190 - 10205
Main Authors	Guo, Hongliang, He, Zhi, Sheng, Wenda, Cao, Zhiguang, Zhou, Yingjie, Gao, Weinan
Format	Journal Article
Language	English
Published	IEEE 01.08.2024
Subjects	Gaussian distribution Generalized actor critic Navigation Optimization Real-time systems Reliability Routing sample efficiency stochastic on-time arrival (SOTA) Transportation variance reduction
Online Access	Get full text

Cover

Loading…

Abstract	This paper studies the stochastic on-time arrival (SOTA) problem in transportation networks and introduces a novel reinforcement learning-based algorithm, namely sample efficient generalized actor critic (SEGAC). Different from almost all canonical SOTA solutions, which are usually computationally expensive and lack generalizability to unforeseen destination nodes, SEGAC offers the following appealing characteristics. SEGAC updates the ego vehicle's navigation policy in a sample efficient manner, reduces the variance of both value network and policy network during training, and is automatically adaptive to new destinations. Furthermore, the pre-trained SEGAC policy network enables its real-time decision-making ability within seconds, outperforming state-of-the-art SOTA algorithms in simulations across various transportation networks. We also successfully deploy SEGAC to two real metropolitan transportation networks, namely Chengdu and Beijing, using real traffic data, with satisfying results.
AbstractList	This paper studies the stochastic on-time arrival (SOTA) problem in transportation networks and introduces a novel reinforcement learning-based algorithm, namely sample efficient generalized actor critic (SEGAC). Different from almost all canonical SOTA solutions, which are usually computationally expensive and lack generalizability to unforeseen destination nodes, SEGAC offers the following appealing characteristics. SEGAC updates the ego vehicle's navigation policy in a sample efficient manner, reduces the variance of both value network and policy network during training, and is automatically adaptive to new destinations. Furthermore, the pre-trained SEGAC policy network enables its real-time decision-making ability within seconds, outperforming state-of-the-art SOTA algorithms in simulations across various transportation networks. We also successfully deploy SEGAC to two real metropolitan transportation networks, namely Chengdu and Beijing, using real traffic data, with satisfying results.
Author	Zhou, Yingjie Sheng, Wenda Cao, Zhiguang He, Zhi Guo, Hongliang Gao, Weinan
Author_xml	– sequence: 1 givenname: Hongliang orcidid: 0000-0002-9836-3090 surname: Guo fullname: Guo, Hongliang organization: College of Computer Science, Sichuan University (SCU), Chengdu, China – sequence: 2 givenname: Zhi orcidid: 0000-0002-2805-2975 surname: He fullname: He, Zhi organization: School of Automation Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu, China – sequence: 3 givenname: Wenda orcidid: 0000-0003-0132-3656 surname: Sheng fullname: Sheng, Wenda organization: School of Automation Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu, China – sequence: 4 givenname: Zhiguang orcidid: 0000-0002-4499-759X surname: Cao fullname: Cao, Zhiguang email: zhiguangcao@outlook.com organization: School of Computing and Information Systems, Singapore Management University, Bras Basah, Singapore – sequence: 5 givenname: Yingjie orcidid: 0000-0002-1129-0213 surname: Zhou fullname: Zhou, Yingjie email: yjzhou09@gmail.com organization: College of Computer Science, Sichuan University (SCU), Chengdu, China – sequence: 6 givenname: Weinan orcidid: 0000-0001-7921-018X surname: Gao fullname: Gao, Weinan organization: State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang, China
BookMark	eNpNkE1Lw0AYhBepYFv9AYKH_QOp--5HkvUWSq2FQoXEk4ew2bxLV_JRNkHQX29De_A0wzAzh2dBZl3fISGPwFYATD8XuyJfccblSogYpFQ3ZA5KpRFjEM8mz2WkmWJ3ZDEMX-dUKoA5-cw322z9QnPTnhqkG-e89diNdIsdBtP4X6xpZsc-0HXwo7fUne14RJqPvT2aYYoOXVT4FmkWgv82DX0PfdVge09unWkGfLjqkny8bor1W7Q_bHfrbB9ZDukY6Vjouq5T4XTCoRKSx5WrNWJqNBfKGIDUcG7i2CYJ4xViwqyTSlXaGicTsSRw-bWhH4aArjwF35rwUwIrJzrlRKec6JRXOufN02XjEfFfXwqhIRZ_WTpifw
CODEN	ITISFG
Cites_doi	10.1007/978-1-4419-0820-9_9 10.1016/j.trb.2018.12.011 10.1109/TITS.2022.3207011 10.1007/s10957-005-7498-5 10.1016/j.trb.2015.04.002 10.1109/TITS.2021.3105415 10.5220/0010267009470956 10.1109/TCYB.2021.3111082 10.1007/BF00992698 10.1109/TVT.2015.2480964 10.1007/978-3-319-38851-9_16 10.1038/s41597-019-0060-3 10.1088/1742-6596/1368/3/032008 10.1016/j.trb.2020.05.013 10.1061/9780784479896.007 10.1109/TNNLS.2021.3068828 10.15607/RSS.2012.VIII.032 10.1016/j.ejor.2018.10.053 10.1007/s40819-022-01351-z 10.1109/ITSC45102.2020.9294650 10.1609/aaai.v36i10.21340 10.1016/j.trb.2018.11.013 10.1137/1.9781611973198.13 10.1109/TVT.2021.3109169 10.1109/MITS.2023.3265309 10.1016/j.trb.2013.10.011 10.1287/opre.2020.2089 10.1155/2017/4586471 10.1016/j.trb.2016.11.012 10.1109/TVT.2020.2964784 10.1080/15472450.2013.806851 10.1145/1008328.1008329 10.1080/21680566.2016.1169953 10.1287/opre.2017.1662 10.1109/MITS.2018.2880260 10.1109/TNNLS.2021.3069728 10.1109/TITS.2021.3096829 10.1109/TITS.2015.2498160 10.1109/TITS.2022.3189865 10.1016/j.trc.2022.103866
ContentType	Journal Article
DBID	97E RIA RIE AAYXX CITATION
DOI	10.1109/TITS.2024.3361445
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005-present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Xplore CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Xplore url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISSN	1558-0016
EndPage	10205
ExternalDocumentID	10_1109_TITS_2024_3361445 10433916
Genre	orig-research
GrantInformation_xml	– fundername: Sichuan Science and Technology Program grantid: 2023NSFSC1965 funderid: 10.13039/100012542 – fundername: Higher Education Discipline Innovation Project; 111 Project grantid: B21044 funderid: 10.13039/501100013314
GroupedDBID	-~X 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AASAJ ABQJQ ABTAH ACGFO ACGFS ACIWK ACNCT AENEX AETIX AIBXA AKJIK ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ H~9 IFIPE IPLJI JAVBF LAI M43 O9- OCL P2P PQQKQ RIA RIE RIG RNS ZY4 AAYXX CITATION
ID	FETCH-LOGICAL-c218t-9639ddd83f9721b3426bfd9ee8a9235aa118a22a66c7702bee70cf455b9caf473
IEDL.DBID	RIE
ISSN	1524-9050
IngestDate	Wed Aug 07 14:11:22 EDT 2024 Wed Aug 14 05:40:28 EDT 2024
IsPeerReviewed	true
IsScholarly	true
Issue	8
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c218t-9639ddd83f9721b3426bfd9ee8a9235aa118a22a66c7702bee70cf455b9caf473
ORCID	0000-0002-2805-2975 0000-0002-1129-0213 0000-0003-0132-3656 0000-0002-4499-759X 0000-0001-7921-018X 0000-0002-9836-3090
PageCount	16
ParticipantIDs	ieee_primary_10433916 crossref_primary_10_1109_TITS_2024_3361445
PublicationCentury	2000
PublicationDate	2024-Aug.
PublicationDateYYYYMMDD	2024-08-01
PublicationDate_xml	– month: 08 year: 2024 text: 2024-Aug.
PublicationDecade	2020
PublicationTitle	IEEE transactions on intelligent transportation systems
PublicationTitleAbbrev	TITS
PublicationYear	2024
Publisher	IEEE
Publisher_xml	– name: IEEE
References	ref13 ref12 ref34 ref15 ref14 ref36 ref31 ref30 ref11 ref33 ref10 Sutton (ref38) 2018 ref2 ref1 ref17 ref39 ref16 ref19 ref18 Kwon (ref35); 33 Andonov (ref37) 2018 ref24 ref46 ref23 ref45 ref26 ref25 ref20 ref42 ref41 ref22 ref44 ref21 Bar-Gera (ref43) 2023 ref28 ref27 ref29 ref8 ref7 Kool (ref32) ref9 ref4 ref3 ref6 Konda (ref40); 12 ref5
References_xml	– ident: ref17 doi: 10.1007/978-1-4419-0820-9_9 – ident: ref44 doi: 10.1016/j.trb.2018.12.011 – ident: ref36 doi: 10.1109/TITS.2022.3207011 – volume-title: Reinforcement Learning: An Introduction year: 2018 ident: ref38 contributor: fullname: Sutton – ident: ref4 doi: 10.1007/s10957-005-7498-5 – ident: ref5 doi: 10.1016/j.trb.2015.04.002 – year: 2018 ident: ref37 article-title: A new formulation of the shortest path problem with on-time arrival reliability publication-title: arXiv:1804.07829 contributor: fullname: Andonov – ident: ref1 doi: 10.1109/TITS.2021.3105415 – ident: ref30 doi: 10.5220/0010267009470956 – volume-title: Transportation Network Test Problems year: 2023 ident: ref43 contributor: fullname: Bar-Gera – ident: ref33 doi: 10.1109/TCYB.2021.3111082 – ident: ref39 doi: 10.1007/BF00992698 – ident: ref22 doi: 10.1109/TVT.2015.2480964 – ident: ref12 doi: 10.1007/978-3-319-38851-9_16 – ident: ref45 doi: 10.1038/s41597-019-0060-3 – ident: ref10 doi: 10.1088/1742-6596/1368/3/032008 – ident: ref18 doi: 10.1016/j.trb.2020.05.013 – ident: ref25 doi: 10.1061/9780784479896.007 – ident: ref34 doi: 10.1109/TNNLS.2021.3068828 – ident: ref16 doi: 10.15607/RSS.2012.VIII.032 – ident: ref14 doi: 10.1016/j.ejor.2018.10.053 – ident: ref6 doi: 10.1007/s40819-022-01351-z – ident: ref7 doi: 10.1109/ITSC45102.2020.9294650 – ident: ref41 doi: 10.1609/aaai.v36i10.21340 – ident: ref3 doi: 10.1016/j.trb.2018.11.013 – ident: ref20 doi: 10.1137/1.9781611973198.13 – ident: ref15 doi: 10.1109/TVT.2021.3109169 – ident: ref31 doi: 10.1109/MITS.2023.3265309 – ident: ref27 doi: 10.1016/j.trb.2013.10.011 – ident: ref29 doi: 10.1287/opre.2020.2089 – ident: ref13 doi: 10.1155/2017/4586471 – ident: ref23 doi: 10.1016/j.trb.2016.11.012 – ident: ref24 doi: 10.1109/TVT.2020.2964784 – ident: ref28 doi: 10.1080/15472450.2013.806851 – ident: ref42 doi: 10.1145/1008328.1008329 – start-page: 1 volume-title: Proc. Int. Conf. Learn. Represent. ident: ref32 article-title: Attention, learn to solve routing problems! contributor: fullname: Kool – volume: 12 start-page: 1008 volume-title: Proc. Adv. Neural Inf. Process. Syst. ident: ref40 article-title: Actor-critic algorithms contributor: fullname: Konda – ident: ref19 doi: 10.1080/21680566.2016.1169953 – ident: ref11 doi: 10.1287/opre.2017.1662 – ident: ref26 doi: 10.1109/MITS.2018.2880260 – ident: ref46 doi: 10.1109/TNNLS.2021.3069728 – ident: ref9 doi: 10.1109/TITS.2021.3096829 – ident: ref21 doi: 10.1109/TITS.2015.2498160 – ident: ref2 doi: 10.1109/TITS.2022.3189865 – volume: 33 start-page: 21188 volume-title: Proc. Adv. Neural Inf. Process. Syst. ident: ref35 article-title: POMO: Policy optimization with multiple optima for reinforcement learning contributor: fullname: Kwon – ident: ref8 doi: 10.1016/j.trc.2022.103866
SSID	ssj0014511
Score	2.4599934
Snippet	This paper studies the stochastic on-time arrival (SOTA) problem in transportation networks and introduces a novel reinforcement learning-based algorithm,...
SourceID	crossref ieee
SourceType	Aggregation Database Publisher
StartPage	10190
SubjectTerms	Gaussian distribution Generalized actor critic Navigation Optimization Real-time systems Reliability Routing sample efficiency stochastic on-time arrival (SOTA) Transportation variance reduction
Title	SEGAC: Sample Efficient Generalized Actor Critic for the Stochastic On-Time Arrival Problem
URI	https://ieeexplore.ieee.org/document/10433916
Volume	25
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3dS8MwEA9uT_rg58T5RR58ElKzJlkb38bYnIJT6AYDH0qaXnAInYzuZX-9SdrJFATfSulHuLvk7nL5_Q6hG87smpdHhgDLOOEdoERJ0CRWRnWMDpnoOOzw87g7mvKnmZjVYHWPhQEAf_gMAnfpa_n5Qq_cVpmd4Zw5oGgDNWIaVmCt75KBI9ry5KghJ5KKTQmzQ-Xd5HGS2FQw5AFjLgESP5zQVlcV71SGB2i8GU51luQjWJVZoNe_mBr_Pd5DtF-Hl7hX2cMR2oHiGO1tkQ6eoLdk8NDr3-NEOWJgPPAcEvY7uGagnq8hxz23l4-rPgjYxrXYxok4KRf6XTliZ_xSEAcesT9azq2t4teqL00LTYeDSX9E6hYLRFvfXhI7_WSe5zEzjsUnY9ZfZyaXALGykZ9QyuYfKgxVt6ujiIYZQES14UJkUivDI3aKmsWigDOEpYmpCiEC5kt7dvXSOucyE8DAGG3a6HYj8_SzYtJIfQZCZeoUlDoFpbWC2qjlxLn1YCXJ8z_uX6Bd93p1Mu8SNcvlCq5stFBm195KvgDPTbuJ
link.rule.ids	315,786,790,802,27957,27958,55109
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3dS8MwEA86H9QHPyfOzzz4JGR2Tbouvo0x3XSbQjcY-FDS9IJD2GR0L_vrvaSdTEHwrZSQhLtL7i539ztCbgTHOy8NDQOeCCZq4DElQbOGMqpmtM-Dmq0d7g_qnZF4Ggfjoljd1cIAgEs-g6r9dLH8dKYX9qkMT7jgtlB0k2yhovdkXq71HTSwUFsOHtUXTHrBKoiJA--G3WGEzqAvqpxbFyj4oYbW-qo4tfKwTwarDeXZJB_VRZZU9fIXVuO_d3xA9goDkzZziTgkGzA9IrtrsIPH5C1qPzZb9zRSFhqYth2KBM5DCwzqyRJS2rSv-TTvhEDRsqVoKdIom-l3ZaGd6cuU2fIRXGg-QWmlr3lnmjIZPbSHrQ4rmiwwjdo9Y3gAZZqmDW4sjk_CUWMnJpUADYW2X6AUeiDK91W9rsPQ8xOA0NNGBEEitTIi5CekNJ1N4ZRQaRqe8iEE7oJ7eH9pnQqZBMDBGG0q5HZF8_gzx9KInQ_iydgyKLYMigsGVUjZknNtYE7Jsz_-X5PtzrDfi3vdwfM52bFT5Xl6F6SUzRdwibZDllw5ifkC5YK-3w
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SEGAC%3A+Sample+Efficient+Generalized+Actor+Critic+for+the+Stochastic+On-Time+Arrival+Problem&rft.jtitle=IEEE+transactions+on+intelligent+transportation+systems&rft.au=Guo%2C+Hongliang&rft.au=He%2C+Zhi&rft.au=Sheng%2C+Wenda&rft.au=Cao%2C+Zhiguang&rft.date=2024-08-01&rft.pub=IEEE&rft.issn=1524-9050&rft.volume=25&rft.issue=8&rft.spage=10190&rft.epage=10205&rft_id=info:doi/10.1109%2FTITS.2024.3361445&rft.externalDocID=10433916
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1524-9050&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1524-9050&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1524-9050&client=summon