Language-Guided Navigation via Cross-Modal Grounding and Alternate Adversarial Learning

The emerging vision-and-language navigation (VLN) problem aims at learning to navigate an agent to the target location in unseen photo-realistic environments according to the given language instruction. The main challenges of VLN arise mainly from two aspects: first, the agent needs to attend to the...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on circuits and systems for video technology Vol. 31; no. 9; pp. 3469 - 3481
Main Authors Zhang, Weixia, Ma, Chao, Wu, Qi, Yang, Xiaokang
Format Journal Article
LanguageEnglish
Published New York IEEE 01.09.2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
Abstract The emerging vision-and-language navigation (VLN) problem aims at learning to navigate an agent to the target location in unseen photo-realistic environments according to the given language instruction. The main challenges of VLN arise mainly from two aspects: first, the agent needs to attend to the meaningful paragraphs of the language instruction corresponding to the dynamically-varying visual environments; second, during the training process, the agent usually imitate the expert demonstrations, i.e ., the shortest-path to the target location specified by associated language instructions. Due to the discrepancy of action selection between training and inference, the agent solely on the basis of imitation learning does not perform well. Existing VLN approaches address this issue by sampling the next action from its predicted probability distribution during the training process. This allows the agent to explore diverse routes from the environments, yielding higher success rates. Nevertheless, without being presented with the golden shortest navigation paths during the training process, the agent may arrive at the target location through an unexpected longer route. To overcome these challenges, we design a cross-modal grounding module, which is composed of two complementary attention mechanisms, to equip the agent with a better ability to track the correspondence between the textual and visual modalities. We then propose to recursively alternate the learning schemes of imitation and exploration to narrow the discrepancy between training and inference. We further exploit the advantages of both these two learning schemes via adversarial learning. Extensive experimental results on the Room-to-Room (R2R) benchmark dataset demonstrate that the proposed learning scheme is generalized and complementary to prior arts. Our method performs well against state-of-the-art approaches in terms of effectiveness and efficiency.
AbstractList The emerging vision-and-language navigation (VLN) problem aims at learning to navigate an agent to the target location in unseen photo-realistic environments according to the given language instruction. The main challenges of VLN arise mainly from two aspects: first, the agent needs to attend to the meaningful paragraphs of the language instruction corresponding to the dynamically-varying visual environments; second, during the training process, the agent usually imitate the expert demonstrations, i.e ., the shortest-path to the target location specified by associated language instructions. Due to the discrepancy of action selection between training and inference, the agent solely on the basis of imitation learning does not perform well. Existing VLN approaches address this issue by sampling the next action from its predicted probability distribution during the training process. This allows the agent to explore diverse routes from the environments, yielding higher success rates. Nevertheless, without being presented with the golden shortest navigation paths during the training process, the agent may arrive at the target location through an unexpected longer route. To overcome these challenges, we design a cross-modal grounding module, which is composed of two complementary attention mechanisms, to equip the agent with a better ability to track the correspondence between the textual and visual modalities. We then propose to recursively alternate the learning schemes of imitation and exploration to narrow the discrepancy between training and inference. We further exploit the advantages of both these two learning schemes via adversarial learning. Extensive experimental results on the Room-to-Room (R2R) benchmark dataset demonstrate that the proposed learning scheme is generalized and complementary to prior arts. Our method performs well against state-of-the-art approaches in terms of effectiveness and efficiency.
Author Wu, Qi
Zhang, Weixia
Yang, Xiaokang
Ma, Chao
Author_xml – sequence: 1
  givenname: Weixia
  orcidid: 0000-0002-3634-2630
  surname: Zhang
  fullname: Zhang, Weixia
  email: zwx8981@sjtu.edu.cn
  organization: MoE Key Laboratory of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China
– sequence: 2
  givenname: Chao
  orcidid: 0000-0002-8459-2845
  surname: Ma
  fullname: Ma, Chao
  email: chaoma@sjtu.edu.cn
  organization: MoE Key Laboratory of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China
– sequence: 3
  givenname: Qi
  orcidid: 0000-0003-3631-256X
  surname: Wu
  fullname: Wu, Qi
  email: qi.wu01@adelaide.edu.au
  organization: School of Computer Science, The University of Adelaide, Adelaide, SA, Australia
– sequence: 4
  givenname: Xiaokang
  orcidid: 0000-0003-4029-3322
  surname: Yang
  fullname: Yang, Xiaokang
  email: xkyang@sjtu.edu.cn
  organization: MoE Key Laboratory of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China
BookMark eNp9kLtOwzAUhi0EEm3hBWCJxJziS2zHYxVBQQowUGCMHOckchWcYieVeHvSixgYmM4Zvv9cvik6dZ0DhK4InhOC1e0qe31fzSmmeM4wU5zSEzQhnKcxpZifjj3mJE4p4edoGsIaY5KkiZygj1y7ZtANxMvBVlBFz3prG93bzkVbq6PMdyHET12l22jpu8FV1jWRdlW0aHvwTvcQLaot-KC9HZkctHcjcoHOat0GuDzWGXq7v1tlD3H-snzMFnlsqOJ9bCBNGKsBjBZGVlUNJZcmSQSXjKpS1aSsS8aEpNykCReY0pKAUkZyhrnQbIZuDnM3vvsaIPTFuhvGs9pQUC5SKSUXZKTogTK7dzzUxcbbT-2_C4KLncBiL7DYCSyOAsdQ-idkbL8303tt2_-j14eoBYDfXYoKThVmP9KZgGg
CODEN ITCTEM
CitedBy_id crossref_primary_10_1109_TCSVT_2021_3075470
crossref_primary_10_1109_TCSVT_2021_3135023
crossref_primary_10_1109_TCSVT_2023_3263484
crossref_primary_10_1109_TCSVT_2024_3401451
crossref_primary_10_1109_TCSVT_2024_3433547
crossref_primary_10_1109_TIM_2023_3324362
crossref_primary_10_1016_j_inffus_2024_102532
crossref_primary_10_1109_TCSVT_2023_3263468
crossref_primary_10_3724_SP_J_1089_2022_19249
crossref_primary_10_1109_TCSVT_2023_3324380
crossref_primary_10_1109_TCSVT_2024_3374786
crossref_primary_10_1109_TCSVT_2023_3291131
crossref_primary_10_1109_TCSVT_2022_3173687
crossref_primary_10_1109_TCSVT_2022_3197159
crossref_primary_10_1109_TCSVT_2022_3233554
crossref_primary_10_1109_TCSVT_2021_3073718
crossref_primary_10_1109_TCSVT_2022_3203974
crossref_primary_10_1093_jcde_qwac084
crossref_primary_10_1109_TNNLS_2021_3122579
crossref_primary_10_1007_s10044_024_01339_z
crossref_primary_10_1109_TCSVT_2023_3326373
crossref_primary_10_1109_TCSVT_2024_3424566
crossref_primary_10_1109_TCSVT_2024_3460874
crossref_primary_10_1109_LRA_2024_3511402
crossref_primary_10_1007_s00521_023_09217_1
crossref_primary_10_1109_TCSVT_2021_3125129
crossref_primary_10_1109_TIP_2025_3546853
crossref_primary_10_1109_TCDS_2023_3339215
crossref_primary_10_3390_math11173685
crossref_primary_10_1109_TCSVT_2022_3211734
crossref_primary_10_1109_TCSVT_2022_3231964
crossref_primary_10_1109_TCSVT_2024_3449109
crossref_primary_10_1109_TCSVT_2022_3232717
crossref_primary_10_1109_TCSVT_2023_3245584
crossref_primary_10_1109_TCSVT_2023_3235704
Cites_doi 10.23919/ChiCC.2018.8482813
10.1109/ICCV.2017.244
10.1109/CVPR.2019.00679
10.1109/CVPR.2018.00808
10.1109/CVPR.2019.00647
10.18653/v1/P19-1655
10.1109/CVPR.2009.5206848
10.18653/v1/N19-1268
10.1109/CVPR.2017.416
10.1109/CVPR.2017.632
10.1162/neco.1997.9.8.1735
10.18653/v1/D15-1166
10.1109/ICRA.2017.7989381
10.1109/3DV.2017.00081
10.18653/v1/P19-1426
10.1109/CVPR.2017.324
10.1109/CVPR.2016.90
10.1109/CVPR.2018.00142
10.1109/CVPR42600.2020.01315
10.1007/978-3-030-01270-0_3
10.1109/ICCV.2017.606
10.1109/CVPR.2018.00636
10.1109/CVPR.2018.00447
10.18653/v1/D19-1159
10.1109/CVPR.2018.00387
10.1109/CVPR.2018.00729
10.1109/ICRA.2015.7139984
10.1109/CVPR.2019.00690
10.1109/CVPR.2019.00644
10.1109/ICCV.2019.00750
10.1109/CVPR.2018.00937
10.1109/CVPR.2019.00689
10.1162/neco.1989.1.2.270
10.1109/79.543975
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/TCSVT.2020.3039522
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList Technology Research Database

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1558-2205
EndPage 3481
ExternalDocumentID 10_1109_TCSVT_2020_3039522
9265290
Genre orig-research
GrantInformation_xml – fundername: Shanghai Pujiang Program
– fundername: NSFC
  grantid: 61901262; U19B2035; 61527804; 61906119
  funderid: 10.13039/501100001809
– fundername: Science and Technology of Shanghai Municipality (STCSM)
  grantid: 18DZ1112300
– fundername: National Key Research and Development Program of China
  grantid: 2016YFB1001003
  funderid: 10.13039/501100012166
GroupedDBID -~X
0R~
29I
4.4
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACGFS
ACIWK
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
HZ~
H~9
ICLAB
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
O9-
OCL
P2P
RIA
RIE
RNS
RXW
TAE
TN5
VH1
AAYXX
CITATION
RIG
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c295t-ce8433feeca6c7ddfeb57c44657329b9f1bfb336725c8456022b1e99c753056a3
IEDL.DBID RIE
ISSN 1051-8215
IngestDate Sun Jun 29 14:33:09 EDT 2025
Thu Apr 24 22:58:43 EDT 2025
Tue Jul 01 00:41:14 EDT 2025
Wed Aug 27 02:27:33 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 9
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c295t-ce8433feeca6c7ddfeb57c44657329b9f1bfb336725c8456022b1e99c753056a3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-3634-2630
0000-0002-8459-2845
0000-0003-3631-256X
0000-0003-4029-3322
PQID 2568777561
PQPubID 85433
PageCount 13
ParticipantIDs proquest_journals_2568777561
crossref_citationtrail_10_1109_TCSVT_2020_3039522
crossref_primary_10_1109_TCSVT_2020_3039522
ieee_primary_9265290
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2021-09-01
PublicationDateYYYYMMDD 2021-09-01
PublicationDate_xml – month: 09
  year: 2021
  text: 2021-09-01
  day: 01
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on circuits and systems for video technology
PublicationTitleAbbrev TCSVT
PublicationYear 2021
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref12
ref15
ref14
ref53
ref52
ma (ref3) 2019
ref54
ref10
ref16
vogel (ref20) 2010
ref19
chen (ref21) 2011
kingma (ref55) 2015
arjun (ref29) 2020; asb 2004 14973
ref51
ref50
ref46
ref45
ref48
lamb (ref18) 2016
ref47
ref41
xu (ref31) 2015
ref49
zhu (ref11) 2020
ref8
ref9
ref4
ref6
ref5
ross (ref43) 2011
ref40
goodfellow (ref44) 2014
bengio (ref17) 2015
ref35
ref34
ref37
ref36
ref32
qiaolin (ref30) 2020; abs 2003 857
fried (ref2) 2018
ref1
ref39
ref38
c?t?lina (ref26) 2019
kim (ref33) 2018
ren (ref42) 2015
landi (ref7) 2019
ref25
ref22
ref28
anderson (ref27) 2018; abs 1807 6757
bojarski (ref13) 2016; abs 1604 7316
das (ref24) 2018
mirowski (ref23) 2017
dulac-arnold (ref56) 2019
References_xml – start-page: 4601
  year: 2016
  ident: ref18
  article-title: Professor forcing: A new algorithm for training recurrent networks
  publication-title: Proc Adv Neural Inf Process Syst
– ident: ref54
  doi: 10.23919/ChiCC.2018.8482813
– volume: abs 1807 6757
  start-page: 1
  year: 2018
  ident: ref27
  article-title: On evaluation of embodied navigation agents
  publication-title: CoRR
– start-page: 91
  year: 2015
  ident: ref42
  article-title: Faster R-CNN: Towards real-time object detection with region proposal networks
  publication-title: Proc Adv Neural Inf Process Syst
– start-page: 10
  year: 2020
  ident: ref11
  article-title: Vision-language navigation with self-supervised auxiliary reasoning tasks
  publication-title: Proc IEEE/CVF Conf Comput Vis Pattern Recognit (CVPR)
– ident: ref46
  doi: 10.1109/ICCV.2017.244
– ident: ref4
  doi: 10.1109/CVPR.2019.00679
– ident: ref37
  doi: 10.1109/CVPR.2018.00808
– volume: abs 1604 7316
  start-page: 1
  year: 2016
  ident: ref13
  article-title: End to end learning for self-driving cars
  publication-title: CoRR
– volume: asb 2004 14973
  start-page: 1
  year: 2020
  ident: ref29
  article-title: Improving vision-and-language navigation with image-text pairs from the Web
  publication-title: CoRR
– ident: ref25
  doi: 10.1109/CVPR.2019.00647
– ident: ref8
  doi: 10.18653/v1/P19-1655
– ident: ref52
  doi: 10.1109/CVPR.2009.5206848
– ident: ref9
  doi: 10.18653/v1/N19-1268
– ident: ref36
  doi: 10.1109/CVPR.2017.416
– start-page: 1171
  year: 2015
  ident: ref17
  article-title: Scheduled sampling for sequence prediction with recurrent neural networks
  publication-title: Proc Adv Neural Inf Process Syst
– ident: ref45
  doi: 10.1109/CVPR.2017.632
– ident: ref50
  doi: 10.1162/neco.1997.9.8.1735
– start-page: 1
  year: 2019
  ident: ref7
  article-title: Embodied vision-and-language navigation with dynamic convolutional filters
  publication-title: Proc Brit Mach Vis Conf
– ident: ref40
  doi: 10.18653/v1/D15-1166
– ident: ref22
  doi: 10.1109/ICRA.2017.7989381
– start-page: 1
  year: 2019
  ident: ref3
  article-title: Self-monitoring navigation agent via auxiliary progress estimation
  publication-title: Proc Int Conf Learn Represent
– ident: ref53
  doi: 10.1109/3DV.2017.00081
– ident: ref16
  doi: 10.18653/v1/P19-1426
– ident: ref48
  doi: 10.1109/CVPR.2017.324
– start-page: 2054
  year: 2018
  ident: ref24
  article-title: Embodied question answering
  publication-title: Proc IEEE Conf Comput Vis and Pattern Recog
– ident: ref51
  doi: 10.1109/CVPR.2016.90
– ident: ref38
  doi: 10.1109/CVPR.2018.00142
– ident: ref28
  doi: 10.1109/CVPR42600.2020.01315
– start-page: 280
  year: 2019
  ident: ref26
  article-title: Videonavqa: Bridging the gap between visual and embodied question answering
  publication-title: Proc Brit Mach Vis Conf
– start-page: 806
  year: 2010
  ident: ref20
  article-title: Learning to follow navigational directions
  publication-title: Proc Assoc for Computational Linguistics (ACL '01)
– ident: ref1
  doi: 10.1007/978-3-030-01270-0_3
– start-page: 1
  year: 2015
  ident: ref55
  article-title: Adam: A method for stochastic optimization
  publication-title: Proc Int Conf Learn Represent
– ident: ref47
  doi: 10.1109/ICCV.2017.606
– volume: abs 2003 857
  start-page: 1
  year: 2020
  ident: ref30
  article-title: Multi-view learning for vision-and-language navigation
  publication-title: CoRR
– start-page: 2048
  year: 2015
  ident: ref31
  article-title: Show, attend and tell: Neural image caption generation with visual attention
  publication-title: Proc Int Conf Mach Learn
– ident: ref32
  doi: 10.1109/CVPR.2018.00636
– ident: ref39
  doi: 10.1109/CVPR.2018.00447
– start-page: 1
  year: 2011
  ident: ref21
  article-title: Learning to interpret natural language navigation instructions from observations
  publication-title: Proc AAAI Conf Artif Intell
– ident: ref12
  doi: 10.18653/v1/D19-1159
– start-page: 1564
  year: 2018
  ident: ref33
  article-title: Bilinear attention networks
  publication-title: Proc Adv Neural Inf Process Syst
– ident: ref15
  doi: 10.1109/CVPR.2018.00387
– ident: ref35
  doi: 10.1109/CVPR.2018.00729
– ident: ref19
  doi: 10.1109/ICRA.2015.7139984
– start-page: 3314
  year: 2018
  ident: ref2
  article-title: Speaker-follower models for vision-and-language navigation
  publication-title: Proc Adv Neural Inf Process Syst
– ident: ref6
  doi: 10.1109/CVPR.2019.00690
– start-page: 1
  year: 2017
  ident: ref23
  article-title: Learning to navigate in complex environments
  publication-title: Proc Int Conf Learn Represent
– start-page: 2672
  year: 2014
  ident: ref44
  article-title: Generative adversarial nets
  publication-title: Proc Adv Neural Inf Process Syst
– start-page: 1
  year: 2019
  ident: ref56
  article-title: Challenges of real-world reinforcement learning
  publication-title: Proc Int Conf Mach Learn
– ident: ref34
  doi: 10.1109/CVPR.2019.00644
– start-page: 627
  year: 2011
  ident: ref43
  article-title: A reduction of imitation learning and structured prediction to no-regret online learning
  publication-title: Proc Int Conf Artif Intell Statist
– ident: ref10
  doi: 10.1109/ICCV.2019.00750
– ident: ref49
  doi: 10.1109/CVPR.2018.00937
– ident: ref5
  doi: 10.1109/CVPR.2019.00689
– ident: ref14
  doi: 10.1162/neco.1989.1.2.270
– ident: ref41
  doi: 10.1109/79.543975
SSID ssj0014847
Score 2.5631766
Snippet The emerging vision-and-language navigation (VLN) problem aims at learning to navigate an agent to the target location in unseen photo-realistic environments...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 3469
SubjectTerms adversarial learning
attention mechanism
embodied navigation
Generators
Grounding
Inference
Language instruction
Learning
Navigation
Shortest-path problems
Task analysis
Training
Trajectory
Vision-and-language
Visualization
Title Language-Guided Navigation via Cross-Modal Grounding and Alternate Adversarial Learning
URI https://ieeexplore.ieee.org/document/9265290
https://www.proquest.com/docview/2568777561
Volume 31
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELZKJxh4FUShIA9skDax8_JYVZQK0S600C3yK6iiShFNO_DrOTtJxUuIzYrsyPJdcved775D6BIem1SL2AloCACFytiJOYWR9FwJBiuVrikUHo7CwcS_mwbTGrre1MJorW3ymW6bob3LVwu5MqGyDiNhQBgA9C0AbkWt1ubGwI9tMzFwFzwnBjtWFci4rDPuPTyOAQoSQKguZQEhX4yQ7ary41ds7Ut_Dw2rnRVpJS_tVS7a8v0baeN_t76PdktHE3cLzThANZ0dop1P9IMN9HRfBiud29VMaYVHfG0ZNxYZXs847pnNO8OFgveYGJUtgME8U7g7L-KIGtuGzktu1BiXXK3PR2jSvxn3Bk7ZaMGRhAW5I3XsU5pqLXkoI6VSLYJIGiq1iBImWOqJVFAaRiSQMXhcYPeFpxmTgHXAgeL0GNWzRaZPECauUoFPhMlh8yMGesp5mjKqBY_DSLhN5FUnn8iShdw0w5gnFo24LLHSSoy0klJaTXS1WfNacHD8Obthjn8zszz5JmpVAk7Kz3SZgL9n-BDBhzz9fdUZ2iYmicUmlbVQPX9b6XPwQnJxYdXvA-Kz2JU
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwzV3NThsxEB5Remg5QFuKCIXWh_ZUbdi117vrAwcUoKEkuTS03Lb-W4RAG0QSEH0WXoV3Y-x1ov6pN6TerJVtaT1jzzf2zDcA7_GzC7UoIs4ydFCYLqJCMmzpJNZosCodu0Th_iDrHqefT_jJAtzNc2GstT74zLZd07_lm5GeuquybUEzTkUcQiiP7O0NOmjjncM9lOYHSg_2h51uFGoIRJoKPom0LVLGKmu1zHRuTGUVz7VjCcsZFUpUiaoUY1lOuS4QTKBJU4kVQiOMR2wgGc77BJ4izuC0yQ6bv1GkhS9fhgAliQq0nLOUnFhsDztfvg7R-aToE8dMcEp_MXu-jssfh7-3aAcrcD9biyaQ5bw9nai2_vEbTeT_ulgvYDlAabLb6P5LWLD1K1j6iWBxFb71wnVs9Gl6ZqwhA3ntOUVGNbk-k6TjFivqjwzO427hfIoPkbUhuxfNTaklvmT1WLqNSgIb7elrOH6UP1uDxXpU23UgNDaGp1S5KL00F7gTpawqwaySRZaruAXJTNKlDjzrrtzHRen9rViUXjtKpx1l0I4WfJyPuWxYRv7Ze9WJe94zSLoFmzOFKsNBNC4R0TrGR0TJG38f9Q6edYf9Xtk7HBy9gefUhez4ELpNWJxcTe0WYq6JeutVn8D3x1afBxwUNg4
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Language-Guided+Navigation+via+Cross-Modal+Grounding+and+Alternate+Adversarial+Learning&rft.jtitle=IEEE+transactions+on+circuits+and+systems+for+video+technology&rft.au=Zhang%2C+Weixia&rft.au=Ma%2C+Chao&rft.au=Wu%2C+Qi&rft.au=Yang%2C+Xiaokang&rft.date=2021-09-01&rft.pub=IEEE&rft.issn=1051-8215&rft.volume=31&rft.issue=9&rft.spage=3469&rft.epage=3481&rft_id=info:doi/10.1109%2FTCSVT.2020.3039522&rft.externalDocID=9265290
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1051-8215&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1051-8215&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1051-8215&client=summon