Language-Guided Navigation via Cross-Modal Grounding and Alternate Adversarial Learning

The emerging vision-and-language navigation (VLN) problem aims at learning to navigate an agent to the target location in unseen photo-realistic environments according to the given language instruction. The main challenges of VLN arise mainly from two aspects: first, the agent needs to attend to the...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on circuits and systems for video technology Vol. 31; no. 9; pp. 3469 - 3481
Main Authors	Zhang, Weixia, Ma, Chao, Wu, Qi, Yang, Xiaokang
Format	Journal Article
Language	English
Published	New York IEEE 01.09.2021 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	adversarial learning attention mechanism embodied navigation Generators Grounding Inference Language instruction Learning Navigation Shortest-path problems Task analysis Training Trajectory Vision-and-language Visualization
Online Access	Get full text

Cover

Loading…

Abstract	The emerging vision-and-language navigation (VLN) problem aims at learning to navigate an agent to the target location in unseen photo-realistic environments according to the given language instruction. The main challenges of VLN arise mainly from two aspects: first, the agent needs to attend to the meaningful paragraphs of the language instruction corresponding to the dynamically-varying visual environments; second, during the training process, the agent usually imitate the expert demonstrations, i.e ., the shortest-path to the target location specified by associated language instructions. Due to the discrepancy of action selection between training and inference, the agent solely on the basis of imitation learning does not perform well. Existing VLN approaches address this issue by sampling the next action from its predicted probability distribution during the training process. This allows the agent to explore diverse routes from the environments, yielding higher success rates. Nevertheless, without being presented with the golden shortest navigation paths during the training process, the agent may arrive at the target location through an unexpected longer route. To overcome these challenges, we design a cross-modal grounding module, which is composed of two complementary attention mechanisms, to equip the agent with a better ability to track the correspondence between the textual and visual modalities. We then propose to recursively alternate the learning schemes of imitation and exploration to narrow the discrepancy between training and inference. We further exploit the advantages of both these two learning schemes via adversarial learning. Extensive experimental results on the Room-to-Room (R2R) benchmark dataset demonstrate that the proposed learning scheme is generalized and complementary to prior arts. Our method performs well against state-of-the-art approaches in terms of effectiveness and efficiency.
AbstractList	The emerging vision-and-language navigation (VLN) problem aims at learning to navigate an agent to the target location in unseen photo-realistic environments according to the given language instruction. The main challenges of VLN arise mainly from two aspects: first, the agent needs to attend to the meaningful paragraphs of the language instruction corresponding to the dynamically-varying visual environments; second, during the training process, the agent usually imitate the expert demonstrations, i.e ., the shortest-path to the target location specified by associated language instructions. Due to the discrepancy of action selection between training and inference, the agent solely on the basis of imitation learning does not perform well. Existing VLN approaches address this issue by sampling the next action from its predicted probability distribution during the training process. This allows the agent to explore diverse routes from the environments, yielding higher success rates. Nevertheless, without being presented with the golden shortest navigation paths during the training process, the agent may arrive at the target location through an unexpected longer route. To overcome these challenges, we design a cross-modal grounding module, which is composed of two complementary attention mechanisms, to equip the agent with a better ability to track the correspondence between the textual and visual modalities. We then propose to recursively alternate the learning schemes of imitation and exploration to narrow the discrepancy between training and inference. We further exploit the advantages of both these two learning schemes via adversarial learning. Extensive experimental results on the Room-to-Room (R2R) benchmark dataset demonstrate that the proposed learning scheme is generalized and complementary to prior arts. Our method performs well against state-of-the-art approaches in terms of effectiveness and efficiency.
Author	Wu, Qi Zhang, Weixia Yang, Xiaokang Ma, Chao
Author_xml	– sequence: 1 givenname: Weixia orcidid: 0000-0002-3634-2630 surname: Zhang fullname: Zhang, Weixia email: zwx8981@sjtu.edu.cn organization: MoE Key Laboratory of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China – sequence: 2 givenname: Chao orcidid: 0000-0002-8459-2845 surname: Ma fullname: Ma, Chao email: chaoma@sjtu.edu.cn organization: MoE Key Laboratory of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China – sequence: 3 givenname: Qi orcidid: 0000-0003-3631-256X surname: Wu fullname: Wu, Qi email: qi.wu01@adelaide.edu.au organization: School of Computer Science, The University of Adelaide, Adelaide, SA, Australia – sequence: 4 givenname: Xiaokang orcidid: 0000-0003-4029-3322 surname: Yang fullname: Yang, Xiaokang email: xkyang@sjtu.edu.cn organization: MoE Key Laboratory of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China
BookMark	eNp9kLtOwzAUhi0EEm3hBWCJxJziS2zHYxVBQQowUGCMHOckchWcYieVeHvSixgYmM4Zvv9cvik6dZ0DhK4InhOC1e0qe31fzSmmeM4wU5zSEzQhnKcxpZifjj3mJE4p4edoGsIaY5KkiZygj1y7ZtANxMvBVlBFz3prG93bzkVbq6PMdyHET12l22jpu8FV1jWRdlW0aHvwTvcQLaot-KC9HZkctHcjcoHOat0GuDzWGXq7v1tlD3H-snzMFnlsqOJ9bCBNGKsBjBZGVlUNJZcmSQSXjKpS1aSsS8aEpNykCReY0pKAUkZyhrnQbIZuDnM3vvsaIPTFuhvGs9pQUC5SKSUXZKTogTK7dzzUxcbbT-2_C4KLncBiL7DYCSyOAsdQ-idkbL8303tt2_-j14eoBYDfXYoKThVmP9KZgGg
CODEN	ITCTEM
CitedBy_id	crossref_primary_10_1109_TCSVT_2021_3075470 crossref_primary_10_1109_TCSVT_2021_3135023 crossref_primary_10_1109_TCSVT_2023_3263484 crossref_primary_10_1109_TCSVT_2024_3401451 crossref_primary_10_1109_TCSVT_2024_3433547 crossref_primary_10_1109_TIM_2023_3324362 crossref_primary_10_1016_j_inffus_2024_102532 crossref_primary_10_1109_TCSVT_2023_3263468 crossref_primary_10_3724_SP_J_1089_2022_19249 crossref_primary_10_1109_TCSVT_2023_3324380 crossref_primary_10_1109_TCSVT_2024_3374786 crossref_primary_10_1109_TCSVT_2023_3291131 crossref_primary_10_1109_TCSVT_2022_3173687 crossref_primary_10_1109_TCSVT_2022_3197159 crossref_primary_10_1109_TCSVT_2022_3233554 crossref_primary_10_1109_TCSVT_2021_3073718 crossref_primary_10_1109_TCSVT_2022_3203974 crossref_primary_10_1093_jcde_qwac084 crossref_primary_10_1109_TNNLS_2021_3122579 crossref_primary_10_1007_s10044_024_01339_z crossref_primary_10_1109_TCSVT_2023_3326373 crossref_primary_10_1109_TCSVT_2024_3424566 crossref_primary_10_1109_TCSVT_2024_3460874 crossref_primary_10_1109_LRA_2024_3511402 crossref_primary_10_1007_s00521_023_09217_1 crossref_primary_10_1109_TCSVT_2021_3125129 crossref_primary_10_1109_TIP_2025_3546853 crossref_primary_10_1109_TCDS_2023_3339215 crossref_primary_10_3390_math11173685 crossref_primary_10_1109_TCSVT_2022_3211734 crossref_primary_10_1109_TCSVT_2022_3231964 crossref_primary_10_1109_TCSVT_2024_3449109 crossref_primary_10_1109_TCSVT_2022_3232717 crossref_primary_10_1109_TCSVT_2023_3245584 crossref_primary_10_1109_TCSVT_2023_3235704
Cites_doi	10.23919/ChiCC.2018.8482813 10.1109/ICCV.2017.244 10.1109/CVPR.2019.00679 10.1109/CVPR.2018.00808 10.1109/CVPR.2019.00647 10.18653/v1/P19-1655 10.1109/CVPR.2009.5206848 10.18653/v1/N19-1268 10.1109/CVPR.2017.416 10.1109/CVPR.2017.632 10.1162/neco.1997.9.8.1735 10.18653/v1/D15-1166 10.1109/ICRA.2017.7989381 10.1109/3DV.2017.00081 10.18653/v1/P19-1426 10.1109/CVPR.2017.324 10.1109/CVPR.2016.90 10.1109/CVPR.2018.00142 10.1109/CVPR42600.2020.01315 10.1007/978-3-030-01270-0_3 10.1109/ICCV.2017.606 10.1109/CVPR.2018.00636 10.1109/CVPR.2018.00447 10.18653/v1/D19-1159 10.1109/CVPR.2018.00387 10.1109/CVPR.2018.00729 10.1109/ICRA.2015.7139984 10.1109/CVPR.2019.00690 10.1109/CVPR.2019.00644 10.1109/ICCV.2019.00750 10.1109/CVPR.2018.00937 10.1109/CVPR.2019.00689 10.1162/neco.1989.1.2.270 10.1109/79.543975
ContentType	Journal Article
Copyright	Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021
Copyright_xml	– notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021
DBID	97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D
DOI	10.1109/TCSVT.2020.3039522
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional
DatabaseTitle	CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional
DatabaseTitleList	Technology Research Database
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISSN	1558-2205
EndPage	3481
ExternalDocumentID	10_1109_TCSVT_2020_3039522 9265290
Genre	orig-research
GrantInformation_xml	– fundername: Shanghai Pujiang Program – fundername: NSFC grantid: 61901262; U19B2035; 61527804; 61906119 funderid: 10.13039/501100001809 – fundername: Science and Technology of Shanghai Municipality (STCSM) grantid: 18DZ1112300 – fundername: National Key Research and Development Program of China grantid: 2016YFB1001003 funderid: 10.13039/501100012166
GroupedDBID	-~X 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ H~9 ICLAB IFIPE IFJZH IPLJI JAVBF LAI M43 O9- OCL P2P RIA RIE RNS RXW TAE TN5 VH1 AAYXX CITATION RIG 7SC 7SP 8FD JQ2 L7M L~C L~D
ID	FETCH-LOGICAL-c295t-ce8433feeca6c7ddfeb57c44657329b9f1bfb336725c8456022b1e99c753056a3
IEDL.DBID	RIE
ISSN	1051-8215
IngestDate	Sun Jun 29 14:33:09 EDT 2025 Thu Apr 24 22:58:43 EDT 2025 Tue Jul 01 00:41:14 EDT 2025 Wed Aug 27 02:27:33 EDT 2025
IsPeerReviewed	true
IsScholarly	true
Issue	9
Language	English
License	https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c295t-ce8433feeca6c7ddfeb57c44657329b9f1bfb336725c8456022b1e99c753056a3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ORCID	0000-0002-3634-2630 0000-0002-8459-2845 0000-0003-3631-256X 0000-0003-4029-3322
PQID	2568777561
PQPubID	85433
PageCount	13
ParticipantIDs	proquest_journals_2568777561 crossref_citationtrail_10_1109_TCSVT_2020_3039522 crossref_primary_10_1109_TCSVT_2020_3039522 ieee_primary_9265290
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2021-09-01
PublicationDateYYYYMMDD	2021-09-01
PublicationDate_xml	– month: 09 year: 2021 text: 2021-09-01 day: 01
PublicationDecade	2020
PublicationPlace	New York
PublicationPlace_xml	– name: New York
PublicationTitle	IEEE transactions on circuits and systems for video technology
PublicationTitleAbbrev	TCSVT
PublicationYear	2021
Publisher	IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml	– name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References	ref12 ref15 ref14 ref53 ref52 ma (ref3) 2019 ref54 ref10 ref16 vogel (ref20) 2010 ref19 chen (ref21) 2011 kingma (ref55) 2015 arjun (ref29) 2020; asb 2004 14973 ref51 ref50 ref46 ref45 ref48 lamb (ref18) 2016 ref47 ref41 xu (ref31) 2015 ref49 zhu (ref11) 2020 ref8 ref9 ref4 ref6 ref5 ross (ref43) 2011 ref40 goodfellow (ref44) 2014 bengio (ref17) 2015 ref35 ref34 ref37 ref36 ref32 qiaolin (ref30) 2020; abs 2003 857 fried (ref2) 2018 ref1 ref39 ref38 c?t?lina (ref26) 2019 kim (ref33) 2018 ren (ref42) 2015 landi (ref7) 2019 ref25 ref22 ref28 anderson (ref27) 2018; abs 1807 6757 bojarski (ref13) 2016; abs 1604 7316 das (ref24) 2018 mirowski (ref23) 2017 dulac-arnold (ref56) 2019
References_xml	– start-page: 4601 year: 2016 ident: ref18 article-title: Professor forcing: A new algorithm for training recurrent networks publication-title: Proc Adv Neural Inf Process Syst – ident: ref54 doi: 10.23919/ChiCC.2018.8482813 – volume: abs 1807 6757 start-page: 1 year: 2018 ident: ref27 article-title: On evaluation of embodied navigation agents publication-title: CoRR – start-page: 91 year: 2015 ident: ref42 article-title: Faster R-CNN: Towards real-time object detection with region proposal networks publication-title: Proc Adv Neural Inf Process Syst – start-page: 10 year: 2020 ident: ref11 article-title: Vision-language navigation with self-supervised auxiliary reasoning tasks publication-title: Proc IEEE/CVF Conf Comput Vis Pattern Recognit (CVPR) – ident: ref46 doi: 10.1109/ICCV.2017.244 – ident: ref4 doi: 10.1109/CVPR.2019.00679 – ident: ref37 doi: 10.1109/CVPR.2018.00808 – volume: abs 1604 7316 start-page: 1 year: 2016 ident: ref13 article-title: End to end learning for self-driving cars publication-title: CoRR – volume: asb 2004 14973 start-page: 1 year: 2020 ident: ref29 article-title: Improving vision-and-language navigation with image-text pairs from the Web publication-title: CoRR – ident: ref25 doi: 10.1109/CVPR.2019.00647 – ident: ref8 doi: 10.18653/v1/P19-1655 – ident: ref52 doi: 10.1109/CVPR.2009.5206848 – ident: ref9 doi: 10.18653/v1/N19-1268 – ident: ref36 doi: 10.1109/CVPR.2017.416 – start-page: 1171 year: 2015 ident: ref17 article-title: Scheduled sampling for sequence prediction with recurrent neural networks publication-title: Proc Adv Neural Inf Process Syst – ident: ref45 doi: 10.1109/CVPR.2017.632 – ident: ref50 doi: 10.1162/neco.1997.9.8.1735 – start-page: 1 year: 2019 ident: ref7 article-title: Embodied vision-and-language navigation with dynamic convolutional filters publication-title: Proc Brit Mach Vis Conf – ident: ref40 doi: 10.18653/v1/D15-1166 – ident: ref22 doi: 10.1109/ICRA.2017.7989381 – start-page: 1 year: 2019 ident: ref3 article-title: Self-monitoring navigation agent via auxiliary progress estimation publication-title: Proc Int Conf Learn Represent – ident: ref53 doi: 10.1109/3DV.2017.00081 – ident: ref16 doi: 10.18653/v1/P19-1426 – ident: ref48 doi: 10.1109/CVPR.2017.324 – start-page: 2054 year: 2018 ident: ref24 article-title: Embodied question answering publication-title: Proc IEEE Conf Comput Vis and Pattern Recog – ident: ref51 doi: 10.1109/CVPR.2016.90 – ident: ref38 doi: 10.1109/CVPR.2018.00142 – ident: ref28 doi: 10.1109/CVPR42600.2020.01315 – start-page: 280 year: 2019 ident: ref26 article-title: Videonavqa: Bridging the gap between visual and embodied question answering publication-title: Proc Brit Mach Vis Conf – start-page: 806 year: 2010 ident: ref20 article-title: Learning to follow navigational directions publication-title: Proc Assoc for Computational Linguistics (ACL '01) – ident: ref1 doi: 10.1007/978-3-030-01270-0_3 – start-page: 1 year: 2015 ident: ref55 article-title: Adam: A method for stochastic optimization publication-title: Proc Int Conf Learn Represent – ident: ref47 doi: 10.1109/ICCV.2017.606 – volume: abs 2003 857 start-page: 1 year: 2020 ident: ref30 article-title: Multi-view learning for vision-and-language navigation publication-title: CoRR – start-page: 2048 year: 2015 ident: ref31 article-title: Show, attend and tell: Neural image caption generation with visual attention publication-title: Proc Int Conf Mach Learn – ident: ref32 doi: 10.1109/CVPR.2018.00636 – ident: ref39 doi: 10.1109/CVPR.2018.00447 – start-page: 1 year: 2011 ident: ref21 article-title: Learning to interpret natural language navigation instructions from observations publication-title: Proc AAAI Conf Artif Intell – ident: ref12 doi: 10.18653/v1/D19-1159 – start-page: 1564 year: 2018 ident: ref33 article-title: Bilinear attention networks publication-title: Proc Adv Neural Inf Process Syst – ident: ref15 doi: 10.1109/CVPR.2018.00387 – ident: ref35 doi: 10.1109/CVPR.2018.00729 – ident: ref19 doi: 10.1109/ICRA.2015.7139984 – start-page: 3314 year: 2018 ident: ref2 article-title: Speaker-follower models for vision-and-language navigation publication-title: Proc Adv Neural Inf Process Syst – ident: ref6 doi: 10.1109/CVPR.2019.00690 – start-page: 1 year: 2017 ident: ref23 article-title: Learning to navigate in complex environments publication-title: Proc Int Conf Learn Represent – start-page: 2672 year: 2014 ident: ref44 article-title: Generative adversarial nets publication-title: Proc Adv Neural Inf Process Syst – start-page: 1 year: 2019 ident: ref56 article-title: Challenges of real-world reinforcement learning publication-title: Proc Int Conf Mach Learn – ident: ref34 doi: 10.1109/CVPR.2019.00644 – start-page: 627 year: 2011 ident: ref43 article-title: A reduction of imitation learning and structured prediction to no-regret online learning publication-title: Proc Int Conf Artif Intell Statist – ident: ref10 doi: 10.1109/ICCV.2019.00750 – ident: ref49 doi: 10.1109/CVPR.2018.00937 – ident: ref5 doi: 10.1109/CVPR.2019.00689 – ident: ref14 doi: 10.1162/neco.1989.1.2.270 – ident: ref41 doi: 10.1109/79.543975
SSID	ssj0014847
Score	2.5631766
Snippet	The emerging vision-and-language navigation (VLN) problem aims at learning to navigate an agent to the target location in unseen photo-realistic environments...
SourceID	proquest crossref ieee
SourceType	Aggregation Database Enrichment Source Index Database Publisher
StartPage	3469
SubjectTerms	adversarial learning attention mechanism embodied navigation Generators Grounding Inference Language instruction Learning Navigation Shortest-path problems Task analysis Training Trajectory Vision-and-language Visualization
Title	Language-Guided Navigation via Cross-Modal Grounding and Alternate Adversarial Learning
URI	https://ieeexplore.ieee.org/document/9265290 https://www.proquest.com/docview/2568777561
Volume	31
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELZKJxh4FUShIA9skDax8_JYVZQK0S600C3yK6iiShFNO_DrOTtJxUuIzYrsyPJdcved775D6BIem1SL2AloCACFytiJOYWR9FwJBiuVrikUHo7CwcS_mwbTGrre1MJorW3ymW6bob3LVwu5MqGyDiNhQBgA9C0AbkWt1ubGwI9tMzFwFzwnBjtWFci4rDPuPTyOAQoSQKguZQEhX4yQ7ary41ds7Ut_Dw2rnRVpJS_tVS7a8v0baeN_t76PdktHE3cLzThANZ0dop1P9IMN9HRfBiud29VMaYVHfG0ZNxYZXs847pnNO8OFgveYGJUtgME8U7g7L-KIGtuGzktu1BiXXK3PR2jSvxn3Bk7ZaMGRhAW5I3XsU5pqLXkoI6VSLYJIGiq1iBImWOqJVFAaRiSQMXhcYPeFpxmTgHXAgeL0GNWzRaZPECauUoFPhMlh8yMGesp5mjKqBY_DSLhN5FUnn8iShdw0w5gnFo24LLHSSoy0klJaTXS1WfNacHD8Obthjn8zszz5JmpVAk7Kz3SZgL9n-BDBhzz9fdUZ2iYmicUmlbVQPX9b6XPwQnJxYdXvA-Kz2JU
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwzV3NThsxEB5Remg5QFuKCIXWh_ZUbdi117vrAwcUoKEkuTS03Lb-W4RAG0QSEH0WXoV3Y-x1ov6pN6TerJVtaT1jzzf2zDcA7_GzC7UoIs4ydFCYLqJCMmzpJNZosCodu0Th_iDrHqefT_jJAtzNc2GstT74zLZd07_lm5GeuquybUEzTkUcQiiP7O0NOmjjncM9lOYHSg_2h51uFGoIRJoKPom0LVLGKmu1zHRuTGUVz7VjCcsZFUpUiaoUY1lOuS4QTKBJU4kVQiOMR2wgGc77BJ4izuC0yQ6bv1GkhS9fhgAliQq0nLOUnFhsDztfvg7R-aToE8dMcEp_MXu-jssfh7-3aAcrcD9biyaQ5bw9nai2_vEbTeT_ulgvYDlAabLb6P5LWLD1K1j6iWBxFb71wnVs9Gl6ZqwhA3ntOUVGNbk-k6TjFivqjwzO427hfIoPkbUhuxfNTaklvmT1WLqNSgIb7elrOH6UP1uDxXpU23UgNDaGp1S5KL00F7gTpawqwaySRZaruAXJTNKlDjzrrtzHRen9rViUXjtKpx1l0I4WfJyPuWxYRv7Ze9WJe94zSLoFmzOFKsNBNC4R0TrGR0TJG38f9Q6edYf9Xtk7HBy9gefUhez4ELpNWJxcTe0WYq6JeutVn8D3x1afBxwUNg4
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Language-Guided+Navigation+via+Cross-Modal+Grounding+and+Alternate+Adversarial+Learning&rft.jtitle=IEEE+transactions+on+circuits+and+systems+for+video+technology&rft.au=Zhang%2C+Weixia&rft.au=Ma%2C+Chao&rft.au=Wu%2C+Qi&rft.au=Yang%2C+Xiaokang&rft.date=2021-09-01&rft.pub=IEEE&rft.issn=1051-8215&rft.volume=31&rft.issue=9&rft.spage=3469&rft.epage=3481&rft_id=info:doi/10.1109%2FTCSVT.2020.3039522&rft.externalDocID=9265290
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1051-8215&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1051-8215&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1051-8215&client=summon