SEGAC: Sample Efficient Generalized Actor Critic for the Stochastic On-Time Arrival Problem
This paper studies the stochastic on-time arrival (SOTA) problem in transportation networks and introduces a novel reinforcement learning-based algorithm, namely sample efficient generalized actor critic (SEGAC). Different from almost all canonical SOTA solutions, which are usually computationally e...
Saved in:
Published in | IEEE transactions on intelligent transportation systems Vol. 25; no. 8; pp. 10190 - 10205 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
IEEE
01.08.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | This paper studies the stochastic on-time arrival (SOTA) problem in transportation networks and introduces a novel reinforcement learning-based algorithm, namely sample efficient generalized actor critic (SEGAC). Different from almost all canonical SOTA solutions, which are usually computationally expensive and lack generalizability to unforeseen destination nodes, SEGAC offers the following appealing characteristics. SEGAC updates the ego vehicle's navigation policy in a sample efficient manner, reduces the variance of both value network and policy network during training, and is automatically adaptive to new destinations. Furthermore, the pre-trained SEGAC policy network enables its real-time decision-making ability within seconds, outperforming state-of-the-art SOTA algorithms in simulations across various transportation networks. We also successfully deploy SEGAC to two real metropolitan transportation networks, namely Chengdu and Beijing, using real traffic data, with satisfying results. |
---|---|
AbstractList | This paper studies the stochastic on-time arrival (SOTA) problem in transportation networks and introduces a novel reinforcement learning-based algorithm, namely sample efficient generalized actor critic (SEGAC). Different from almost all canonical SOTA solutions, which are usually computationally expensive and lack generalizability to unforeseen destination nodes, SEGAC offers the following appealing characteristics. SEGAC updates the ego vehicle's navigation policy in a sample efficient manner, reduces the variance of both value network and policy network during training, and is automatically adaptive to new destinations. Furthermore, the pre-trained SEGAC policy network enables its real-time decision-making ability within seconds, outperforming state-of-the-art SOTA algorithms in simulations across various transportation networks. We also successfully deploy SEGAC to two real metropolitan transportation networks, namely Chengdu and Beijing, using real traffic data, with satisfying results. |
Author | Zhou, Yingjie Sheng, Wenda Cao, Zhiguang He, Zhi Guo, Hongliang Gao, Weinan |
Author_xml | – sequence: 1 givenname: Hongliang orcidid: 0000-0002-9836-3090 surname: Guo fullname: Guo, Hongliang organization: College of Computer Science, Sichuan University (SCU), Chengdu, China – sequence: 2 givenname: Zhi orcidid: 0000-0002-2805-2975 surname: He fullname: He, Zhi organization: School of Automation Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu, China – sequence: 3 givenname: Wenda orcidid: 0000-0003-0132-3656 surname: Sheng fullname: Sheng, Wenda organization: School of Automation Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu, China – sequence: 4 givenname: Zhiguang orcidid: 0000-0002-4499-759X surname: Cao fullname: Cao, Zhiguang email: zhiguangcao@outlook.com organization: School of Computing and Information Systems, Singapore Management University, Bras Basah, Singapore – sequence: 5 givenname: Yingjie orcidid: 0000-0002-1129-0213 surname: Zhou fullname: Zhou, Yingjie email: yjzhou09@gmail.com organization: College of Computer Science, Sichuan University (SCU), Chengdu, China – sequence: 6 givenname: Weinan orcidid: 0000-0001-7921-018X surname: Gao fullname: Gao, Weinan organization: State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang, China |
BookMark | eNpNkE1Lw0AYhBepYFv9AYKH_QOp--5HkvUWSq2FQoXEk4ew2bxLV_JRNkHQX29De_A0wzAzh2dBZl3fISGPwFYATD8XuyJfccblSogYpFQ3ZA5KpRFjEM8mz2WkmWJ3ZDEMX-dUKoA5-cw322z9QnPTnhqkG-e89diNdIsdBtP4X6xpZsc-0HXwo7fUne14RJqPvT2aYYoOXVT4FmkWgv82DX0PfdVge09unWkGfLjqkny8bor1W7Q_bHfrbB9ZDukY6Vjouq5T4XTCoRKSx5WrNWJqNBfKGIDUcG7i2CYJ4xViwqyTSlXaGicTsSRw-bWhH4aArjwF35rwUwIrJzrlRKec6JRXOufN02XjEfFfXwqhIRZ_WTpifw |
CODEN | ITISFG |
Cites_doi | 10.1007/978-1-4419-0820-9_9 10.1016/j.trb.2018.12.011 10.1109/TITS.2022.3207011 10.1007/s10957-005-7498-5 10.1016/j.trb.2015.04.002 10.1109/TITS.2021.3105415 10.5220/0010267009470956 10.1109/TCYB.2021.3111082 10.1007/BF00992698 10.1109/TVT.2015.2480964 10.1007/978-3-319-38851-9_16 10.1038/s41597-019-0060-3 10.1088/1742-6596/1368/3/032008 10.1016/j.trb.2020.05.013 10.1061/9780784479896.007 10.1109/TNNLS.2021.3068828 10.15607/RSS.2012.VIII.032 10.1016/j.ejor.2018.10.053 10.1007/s40819-022-01351-z 10.1109/ITSC45102.2020.9294650 10.1609/aaai.v36i10.21340 10.1016/j.trb.2018.11.013 10.1137/1.9781611973198.13 10.1109/TVT.2021.3109169 10.1109/MITS.2023.3265309 10.1016/j.trb.2013.10.011 10.1287/opre.2020.2089 10.1155/2017/4586471 10.1016/j.trb.2016.11.012 10.1109/TVT.2020.2964784 10.1080/15472450.2013.806851 10.1145/1008328.1008329 10.1080/21680566.2016.1169953 10.1287/opre.2017.1662 10.1109/MITS.2018.2880260 10.1109/TNNLS.2021.3069728 10.1109/TITS.2021.3096829 10.1109/TITS.2015.2498160 10.1109/TITS.2022.3189865 10.1016/j.trc.2022.103866 |
ContentType | Journal Article |
DBID | 97E RIA RIE AAYXX CITATION |
DOI | 10.1109/TITS.2024.3361445 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005-present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Xplore CrossRef |
DatabaseTitle | CrossRef |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering |
EISSN | 1558-0016 |
EndPage | 10205 |
ExternalDocumentID | 10_1109_TITS_2024_3361445 10433916 |
Genre | orig-research |
GrantInformation_xml | – fundername: Sichuan Science and Technology Program grantid: 2023NSFSC1965 funderid: 10.13039/100012542 – fundername: Higher Education Discipline Innovation Project; 111 Project grantid: B21044 funderid: 10.13039/501100013314 |
GroupedDBID | -~X 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AASAJ ABQJQ ABTAH ACGFO ACGFS ACIWK ACNCT AENEX AETIX AIBXA AKJIK ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ H~9 IFIPE IPLJI JAVBF LAI M43 O9- OCL P2P PQQKQ RIA RIE RIG RNS ZY4 AAYXX CITATION |
ID | FETCH-LOGICAL-c218t-9639ddd83f9721b3426bfd9ee8a9235aa118a22a66c7702bee70cf455b9caf473 |
IEDL.DBID | RIE |
ISSN | 1524-9050 |
IngestDate | Wed Aug 07 14:11:22 EDT 2024 Wed Aug 14 05:40:28 EDT 2024 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 8 |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c218t-9639ddd83f9721b3426bfd9ee8a9235aa118a22a66c7702bee70cf455b9caf473 |
ORCID | 0000-0002-2805-2975 0000-0002-1129-0213 0000-0003-0132-3656 0000-0002-4499-759X 0000-0001-7921-018X 0000-0002-9836-3090 |
PageCount | 16 |
ParticipantIDs | ieee_primary_10433916 crossref_primary_10_1109_TITS_2024_3361445 |
PublicationCentury | 2000 |
PublicationDate | 2024-Aug. |
PublicationDateYYYYMMDD | 2024-08-01 |
PublicationDate_xml | – month: 08 year: 2024 text: 2024-Aug. |
PublicationDecade | 2020 |
PublicationTitle | IEEE transactions on intelligent transportation systems |
PublicationTitleAbbrev | TITS |
PublicationYear | 2024 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
References | ref13 ref12 ref34 ref15 ref14 ref36 ref31 ref30 ref11 ref33 ref10 Sutton (ref38) 2018 ref2 ref1 ref17 ref39 ref16 ref19 ref18 Kwon (ref35); 33 Andonov (ref37) 2018 ref24 ref46 ref23 ref45 ref26 ref25 ref20 ref42 ref41 ref22 ref44 ref21 Bar-Gera (ref43) 2023 ref28 ref27 ref29 ref8 ref7 Kool (ref32) ref9 ref4 ref3 ref6 Konda (ref40); 12 ref5 |
References_xml | – ident: ref17 doi: 10.1007/978-1-4419-0820-9_9 – ident: ref44 doi: 10.1016/j.trb.2018.12.011 – ident: ref36 doi: 10.1109/TITS.2022.3207011 – volume-title: Reinforcement Learning: An Introduction year: 2018 ident: ref38 contributor: fullname: Sutton – ident: ref4 doi: 10.1007/s10957-005-7498-5 – ident: ref5 doi: 10.1016/j.trb.2015.04.002 – year: 2018 ident: ref37 article-title: A new formulation of the shortest path problem with on-time arrival reliability publication-title: arXiv:1804.07829 contributor: fullname: Andonov – ident: ref1 doi: 10.1109/TITS.2021.3105415 – ident: ref30 doi: 10.5220/0010267009470956 – volume-title: Transportation Network Test Problems year: 2023 ident: ref43 contributor: fullname: Bar-Gera – ident: ref33 doi: 10.1109/TCYB.2021.3111082 – ident: ref39 doi: 10.1007/BF00992698 – ident: ref22 doi: 10.1109/TVT.2015.2480964 – ident: ref12 doi: 10.1007/978-3-319-38851-9_16 – ident: ref45 doi: 10.1038/s41597-019-0060-3 – ident: ref10 doi: 10.1088/1742-6596/1368/3/032008 – ident: ref18 doi: 10.1016/j.trb.2020.05.013 – ident: ref25 doi: 10.1061/9780784479896.007 – ident: ref34 doi: 10.1109/TNNLS.2021.3068828 – ident: ref16 doi: 10.15607/RSS.2012.VIII.032 – ident: ref14 doi: 10.1016/j.ejor.2018.10.053 – ident: ref6 doi: 10.1007/s40819-022-01351-z – ident: ref7 doi: 10.1109/ITSC45102.2020.9294650 – ident: ref41 doi: 10.1609/aaai.v36i10.21340 – ident: ref3 doi: 10.1016/j.trb.2018.11.013 – ident: ref20 doi: 10.1137/1.9781611973198.13 – ident: ref15 doi: 10.1109/TVT.2021.3109169 – ident: ref31 doi: 10.1109/MITS.2023.3265309 – ident: ref27 doi: 10.1016/j.trb.2013.10.011 – ident: ref29 doi: 10.1287/opre.2020.2089 – ident: ref13 doi: 10.1155/2017/4586471 – ident: ref23 doi: 10.1016/j.trb.2016.11.012 – ident: ref24 doi: 10.1109/TVT.2020.2964784 – ident: ref28 doi: 10.1080/15472450.2013.806851 – ident: ref42 doi: 10.1145/1008328.1008329 – start-page: 1 volume-title: Proc. Int. Conf. Learn. Represent. ident: ref32 article-title: Attention, learn to solve routing problems! contributor: fullname: Kool – volume: 12 start-page: 1008 volume-title: Proc. Adv. Neural Inf. Process. Syst. ident: ref40 article-title: Actor-critic algorithms contributor: fullname: Konda – ident: ref19 doi: 10.1080/21680566.2016.1169953 – ident: ref11 doi: 10.1287/opre.2017.1662 – ident: ref26 doi: 10.1109/MITS.2018.2880260 – ident: ref46 doi: 10.1109/TNNLS.2021.3069728 – ident: ref9 doi: 10.1109/TITS.2021.3096829 – ident: ref21 doi: 10.1109/TITS.2015.2498160 – ident: ref2 doi: 10.1109/TITS.2022.3189865 – volume: 33 start-page: 21188 volume-title: Proc. Adv. Neural Inf. Process. Syst. ident: ref35 article-title: POMO: Policy optimization with multiple optima for reinforcement learning contributor: fullname: Kwon – ident: ref8 doi: 10.1016/j.trc.2022.103866 |
SSID | ssj0014511 |
Score | 2.4599934 |
Snippet | This paper studies the stochastic on-time arrival (SOTA) problem in transportation networks and introduces a novel reinforcement learning-based algorithm,... |
SourceID | crossref ieee |
SourceType | Aggregation Database Publisher |
StartPage | 10190 |
SubjectTerms | Gaussian distribution Generalized actor critic Navigation Optimization Real-time systems Reliability Routing sample efficiency stochastic on-time arrival (SOTA) Transportation variance reduction |
Title | SEGAC: Sample Efficient Generalized Actor Critic for the Stochastic On-Time Arrival Problem |
URI | https://ieeexplore.ieee.org/document/10433916 |
Volume | 25 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3dS8MwEA9uT_rg58T5RR58ElKzJlkb38bYnIJT6AYDH0qaXnAInYzuZX-9SdrJFATfSulHuLvk7nL5_Q6hG87smpdHhgDLOOEdoERJ0CRWRnWMDpnoOOzw87g7mvKnmZjVYHWPhQEAf_gMAnfpa_n5Qq_cVpmd4Zw5oGgDNWIaVmCt75KBI9ry5KghJ5KKTQmzQ-Xd5HGS2FQw5AFjLgESP5zQVlcV71SGB2i8GU51luQjWJVZoNe_mBr_Pd5DtF-Hl7hX2cMR2oHiGO1tkQ6eoLdk8NDr3-NEOWJgPPAcEvY7uGagnq8hxz23l4-rPgjYxrXYxok4KRf6XTliZ_xSEAcesT9azq2t4teqL00LTYeDSX9E6hYLRFvfXhI7_WSe5zEzjsUnY9ZfZyaXALGykZ9QyuYfKgxVt6ujiIYZQES14UJkUivDI3aKmsWigDOEpYmpCiEC5kt7dvXSOucyE8DAGG3a6HYj8_SzYtJIfQZCZeoUlDoFpbWC2qjlxLn1YCXJ8z_uX6Bd93p1Mu8SNcvlCq5stFBm195KvgDPTbuJ |
link.rule.ids | 315,786,790,802,27957,27958,55109 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3dS8MwEA86H9QHPyfOzzz4JGR2Tbouvo0x3XSbQjcY-FDS9IJD2GR0L_vrvaSdTEHwrZSQhLtL7i539ztCbgTHOy8NDQOeCCZq4DElQbOGMqpmtM-Dmq0d7g_qnZF4Ggfjoljd1cIAgEs-g6r9dLH8dKYX9qkMT7jgtlB0k2yhovdkXq71HTSwUFsOHtUXTHrBKoiJA--G3WGEzqAvqpxbFyj4oYbW-qo4tfKwTwarDeXZJB_VRZZU9fIXVuO_d3xA9goDkzZziTgkGzA9IrtrsIPH5C1qPzZb9zRSFhqYth2KBM5DCwzqyRJS2rSv-TTvhEDRsqVoKdIom-l3ZaGd6cuU2fIRXGg-QWmlr3lnmjIZPbSHrQ4rmiwwjdo9Y3gAZZqmDW4sjk_CUWMnJpUADYW2X6AUeiDK91W9rsPQ8xOA0NNGBEEitTIi5CekNJ1N4ZRQaRqe8iEE7oJ7eH9pnQqZBMDBGG0q5HZF8_gzx9KInQ_iydgyKLYMigsGVUjZknNtYE7Jsz_-X5PtzrDfi3vdwfM52bFT5Xl6F6SUzRdwibZDllw5ifkC5YK-3w |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SEGAC%3A+Sample+Efficient+Generalized+Actor+Critic+for+the+Stochastic+On-Time+Arrival+Problem&rft.jtitle=IEEE+transactions+on+intelligent+transportation+systems&rft.au=Guo%2C+Hongliang&rft.au=He%2C+Zhi&rft.au=Sheng%2C+Wenda&rft.au=Cao%2C+Zhiguang&rft.date=2024-08-01&rft.pub=IEEE&rft.issn=1524-9050&rft.volume=25&rft.issue=8&rft.spage=10190&rft.epage=10205&rft_id=info:doi/10.1109%2FTITS.2024.3361445&rft.externalDocID=10433916 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1524-9050&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1524-9050&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1524-9050&client=summon |