Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA

Text-based visual question answering (TextVQA) faces the significant challenge of avoiding redundant relational inference. To be specific, a large number of detected objects and optical character recognition (OCR) tokens result in rich visual relationships. Existing works take all visual relationshi...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on image processing Vol. 32; pp. 5060 - 5074
Main Authors Zhou, Sheng, Guo, Dan, Li, Jia, Yang, Xun, Wang, Meng
Format Journal Article
LanguageEnglish
Published New York IEEE 2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN1057-7149
1941-0042
1941-0042
DOI10.1109/TIP.2023.3310332

Cover

Loading…
Abstract Text-based visual question answering (TextVQA) faces the significant challenge of avoiding redundant relational inference. To be specific, a large number of detected objects and optical character recognition (OCR) tokens result in rich visual relationships. Existing works take all visual relationships into account for answer prediction. However, there are three observations: (1) a single subject in the images can be easily detected as multiple objects with distinct bounding boxes (considered repetitive objects). The associations between these repetitive objects are superfluous for answer reasoning; (2) two spatially distant OCR tokens detected in the image frequently have weak semantic dependencies for answer reasoning; and (3) the co-existence of nearby objects and tokens may be indicative of important visual cues for predicting answers. Rather than utilizing all of them for answer prediction, we make an effort to identify the most important connections or eliminate redundant ones. We propose a sparse spatial graph network (SSGN) that introduces a spatially aware relation pruning technique to this task. As spatial factors for relation measurement, we employ spatial distance, geometric dimension, overlap area, and DIoU for spatially aware pruning. We consider three visual relationships for graph learning: object-object, OCR-OCR tokens, and object-OCR token relationships. SSGN is a progressive graph learning architecture that verifies the pivotal relations in the correlated object-token sparse graph, and then in the respective object-based sparse graph and token-based sparse graph. Experiment results on TextVQA and ST-VQA datasets demonstrate that SSGN achieves promising performances. And some visualization results further demonstrate the interpretability of our method.
AbstractList Text-based visual question answering (TextVQA) faces the significant challenge of avoiding redundant relational inference. To be specific, a large number of detected objects and optical character recognition (OCR) tokens result in rich visual relationships. Existing works take all visual relationships into account for answer prediction. However, there are three observations: (1) a single subject in the images can be easily detected as multiple objects with distinct bounding boxes (considered repetitive objects). The associations between these repetitive objects are superfluous for answer reasoning; (2) two spatially distant OCR tokens detected in the image frequently have weak semantic dependencies for answer reasoning; and (3) the co-existence of nearby objects and tokens may be indicative of important visual cues for predicting answers. Rather than utilizing all of them for answer prediction, we make an effort to identify the most important connections or eliminate redundant ones. We propose a sparse spatial graph network (SSGN) that introduces a spatially aware relation pruning technique to this task. As spatial factors for relation measurement, we employ spatial distance, geometric dimension, overlap area, and DIoU for spatially aware pruning. We consider three visual relationships for graph learning: object-object, OCR-OCR tokens, and object-OCR token relationships. SSGN is a progressive graph learning architecture that verifies the pivotal relations in the correlated object-token sparse graph, and then in the respective object-based sparse graph and token-based sparse graph. Experiment results on TextVQA and ST-VQA datasets demonstrate that SSGN achieves promising performances. And some visualization results further demonstrate the interpretability of our method.
Text-based visual question answering (TextVQA) faces the significant challenge of avoiding redundant relational inference. To be specific, a large number of detected objects and optical character recognition (OCR) tokens result in rich visual relationships. Existing works take all visual relationships into account for answer prediction. However, there are three observations: (1) a single subject in the images can be easily detected as multiple objects with distinct bounding boxes (considered repetitive objects). The associations between these repetitive objects are superfluous for answer reasoning; (2) two spatially distant OCR tokens detected in the image frequently have weak semantic dependencies for answer reasoning; and (3) the co-existence of nearby objects and tokens may be indicative of important visual cues for predicting answers. Rather than utilizing all of them for answer prediction, we make an effort to identify the most important connections or eliminate redundant ones. We propose a sparse spatial graph network (SSGN) that introduces a spatially aware relation pruning technique to this task. As spatial factors for relation measurement, we employ spatial distance, geometric dimension, overlap area, and DIoU for spatially aware pruning. We consider three visual relationships for graph learning: object-object, OCR-OCR tokens, and object-OCR token relationships. SSGN is a progressive graph learning architecture that verifies the pivotal relations in the correlated object-token sparse graph, and then in the respective object-based sparse graph and token-based sparse graph. Experiment results on TextVQA and ST-VQA datasets demonstrate that SSGN achieves promising performances. And some visualization results further demonstrate the interpretability of our method.Text-based visual question answering (TextVQA) faces the significant challenge of avoiding redundant relational inference. To be specific, a large number of detected objects and optical character recognition (OCR) tokens result in rich visual relationships. Existing works take all visual relationships into account for answer prediction. However, there are three observations: (1) a single subject in the images can be easily detected as multiple objects with distinct bounding boxes (considered repetitive objects). The associations between these repetitive objects are superfluous for answer reasoning; (2) two spatially distant OCR tokens detected in the image frequently have weak semantic dependencies for answer reasoning; and (3) the co-existence of nearby objects and tokens may be indicative of important visual cues for predicting answers. Rather than utilizing all of them for answer prediction, we make an effort to identify the most important connections or eliminate redundant ones. We propose a sparse spatial graph network (SSGN) that introduces a spatially aware relation pruning technique to this task. As spatial factors for relation measurement, we employ spatial distance, geometric dimension, overlap area, and DIoU for spatially aware pruning. We consider three visual relationships for graph learning: object-object, OCR-OCR tokens, and object-OCR token relationships. SSGN is a progressive graph learning architecture that verifies the pivotal relations in the correlated object-token sparse graph, and then in the respective object-based sparse graph and token-based sparse graph. Experiment results on TextVQA and ST-VQA datasets demonstrate that SSGN achieves promising performances. And some visualization results further demonstrate the interpretability of our method.
Author Zhou, Sheng
Li, Jia
Wang, Meng
Yang, Xun
Guo, Dan
Author_xml – sequence: 1
  givenname: Sheng
  orcidid: 0009-0007-4215-5464
  surname: Zhou
  fullname: Zhou, Sheng
  email: hzgn97@gmail.com
  organization: School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China
– sequence: 2
  givenname: Dan
  orcidid: 0000-0003-2594-254X
  surname: Guo
  fullname: Guo, Dan
  email: guodan@hfut.edu.cn
  organization: School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China
– sequence: 3
  givenname: Jia
  orcidid: 0000-0001-9446-249X
  surname: Li
  fullname: Li, Jia
  email: jiali@hfut.edu.cn
  organization: School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China
– sequence: 4
  givenname: Xun
  orcidid: 0000-0003-0201-1638
  surname: Yang
  fullname: Yang, Xun
  email: xyang21@ustc.edu.cn
  organization: School of Information Science and Technology, University of Science and Technology of China, Hefei, China
– sequence: 5
  givenname: Meng
  orcidid: 0000-0002-3094-7735
  surname: Wang
  fullname: Wang, Meng
  email: eric.mengwang@gmail.com
  organization: School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China
BookMark eNp9kD1PwzAQhi0E4ntnYIjEwpLi8zlOPJaqlEpIfBXWyDgXCEqdYKcS_HsS2gExMN07PO_d6Tlg265xxNgJ8BEA1xeL-d1IcIEjROCIYovtg5YQcy7Fdp95ksYpSL3HDkJ45xxkAmqX7WGqlIYs22ez6WdbN75yr9Fja3ygYXSVqaMHqvvQuKhy0cyb9i2au5I8OUtR2fhoQZ9dfGkCFdHz_fiI7ZSmDnS8mYfs6Wq6mFzHN7ez-WR8E1sUSRdblaQJaK1FaYyRKWUcVamMLBSlGSpNptBgMw6ZtlpwjUViqExVaouXAhEP2fl6b-ubjxWFLl9WwVJdG0fNKuQiU6AkCi169OwP-t6svOu_GyiJEhIcKL6mrG9C8FTmra-Wxn_lwPNBct5LzgfJ-UZyX1F_Krbqflx13lT1f8XTdbEiol93hATkCr8B6z-HCA
CODEN IIPRE4
CitedBy_id crossref_primary_10_1007_s11432_023_4084_6
crossref_primary_10_1016_j_eswa_2024_124785
crossref_primary_10_1016_j_neucom_2025_129999
crossref_primary_10_1007_s40747_024_01602_0
crossref_primary_10_1007_s11263_024_02106_7
crossref_primary_10_1109_TGRS_2024_3502800
crossref_primary_10_1109_TIP_2024_3468884
crossref_primary_10_1007_s00530_024_01394_w
crossref_primary_10_1016_j_neucom_2025_129906
crossref_primary_10_1109_TMM_2024_3385997
crossref_primary_10_1007_s00138_024_01645_w
Cites_doi 10.1109/CVPR46437.2021.00702
10.1109/TPAMI.2016.2577031
10.1145/3343031.3350881
10.1007/s11263-016-0981-7
10.1145/3474085.3475425
10.1109/TPAMI.2021.3132034
10.1145/3219819.3219861
10.1109/CVPR.2019.00686
10.1609/aaai.v34i07.6999
10.1007/978-3-030-58558-7_40
10.1145/3394171.3413753
10.1109/ICCV.2013.378
10.24963/ijcai.2020/153
10.1007/978-3-030-01264-9_42
10.1109/CVPR46437.2021.00864
10.1109/TIP.2021.3051756
10.1109/CVPR42600.2020.01014
10.24963/ijcai.2019/423
10.1609/aaai.v32i1.12235
10.1007/s11263-009-0275-4
10.1145/3394171.3413924
10.1007/s11432-021-3383-y
10.1109/ICCV.2019.00439
10.24963/ijcai.2019/105
10.1109/CVPR.2019.00851
10.1109/CVPR.2019.00075
10.1109/TIP.2021.3097180
10.1609/aaai.v35i4.16476
10.1109/ICCV.2019.00481
10.1109/CVPR42600.2020.01001
10.1109/TPAMI.2014.2339814
10.1109/ICCVW54120.2021.00297
10.1109/TPAMI.2020.2973983
10.1109/CVPR.2018.00380
10.1109/ICDAR.2013.221
10.18653/v1/2020.acl-main.642
10.1109/CVPR42600.2020.01276
10.1162/tacl_a_00051
10.1109/TIP.2020.3004729
10.1109/CVPR.2009.5206848
10.1109/TIP.2022.3181516
10.18653/v1/2020.coling-main.278
10.1109/ICCV.2019.01041
10.1109/CVPR46437.2021.00136
10.1145/3474085.3475606
10.1109/CVPR42600.2020.01211
10.1109/ICDAR.2015.7333942
10.1007/978-3-030-58536-5_44
10.1109/CVPR42600.2020.01459
10.1007/978-3-030-58545-7_41
10.1109/CVPR.2018.00592
10.18653/v1/2021.findings-acl.20
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
7X8
DOI 10.1109/TIP.2023.3310332
DatabaseName IEEE Xplore (IEEE)
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
DatabaseTitleList Technology Research Database
MEDLINE - Academic

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
Engineering
EISSN 1941-0042
EndPage 5074
ExternalDocumentID 10_1109_TIP_2023_3310332
10241306
Genre orig-research
GrantInformation_xml – fundername: National Natural Science Foundation of China
  grantid: 62020106007; 62272144; U20A20183; 72188101; 62272435; U22A2094; 62202139
  funderid: 10.13039/501100001809
– fundername: Major Project of Anhui Province
  grantid: 202203a05020011
– fundername: National Key Research and Development Program of China
  grantid: 2022YFB4500600
  funderid: 10.13039/501100012166
GroupedDBID ---
-~X
.DC
0R~
29I
4.4
53G
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABFSI
ABQJQ
ABVLG
ACGFO
ACGFS
ACIWK
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
E.L
EBS
EJD
F5P
HZ~
H~9
ICLAB
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
RIA
RIE
RNS
TAE
TN5
VH1
AAYOK
AAYXX
CITATION
RIG
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
7X8
ID FETCH-LOGICAL-c325t-c657519992faaa47e8036f6a4d6e78369ead91c80189c92093d5aef767cdbd333
IEDL.DBID RIE
ISSN 1057-7149
1941-0042
IngestDate Thu Jul 10 19:58:41 EDT 2025
Mon Jun 30 10:13:00 EDT 2025
Thu Apr 24 23:01:41 EDT 2025
Tue Jul 01 02:18:57 EDT 2025
Wed Aug 27 02:51:02 EDT 2025
IsPeerReviewed true
IsScholarly true
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c325t-c657519992faaa47e8036f6a4d6e78369ead91c80189c92093d5aef767cdbd333
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ORCID 0009-0007-4215-5464
0000-0002-3094-7735
0000-0003-2594-254X
0000-0001-9446-249X
0000-0003-0201-1638
PMID 37669188
PQID 2864341532
PQPubID 85429
PageCount 15
ParticipantIDs proquest_miscellaneous_2861643292
ieee_primary_10241306
crossref_citationtrail_10_1109_TIP_2023_3310332
proquest_journals_2864341532
crossref_primary_10_1109_TIP_2023_3310332
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 20230000
2023-00-00
20230101
PublicationDateYYYYMMDD 2023-01-01
PublicationDate_xml – year: 2023
  text: 20230000
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on image processing
PublicationTitleAbbrev TIP
PublicationYear 2023
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref57
devlin (ref10) 2019
ref12
ref56
ref15
ref59
ref14
ref58
ref53
ref52
ref11
ref55
ref54
ref17
ref16
ref19
ref18
ref51
ref50
ref46
levenshtein (ref36) 1966; 10
ref45
ref42
ref41
ref44
ref43
jin (ref28) 2021; 25
krasin (ref34) 2017
ref8
ref7
ref9
ref3
ref6
ref5
veli?kovi? (ref49) 2018
biten (ref4) 2019
ref40
ref35
ref37
ref31
ref30
ref33
ref32
ref2
ref1
ref39
ref38
vaswani (ref47) 2017
veit (ref48) 2016
ref24
ref23
ref26
ref25
ref20
ref22
jiang (ref27) 2018
ref21
ref29
ref60
ref61
References_xml – ident: ref12
  doi: 10.1109/CVPR46437.2021.00702
– ident: ref43
  doi: 10.1109/TPAMI.2016.2577031
– ident: ref16
  doi: 10.1145/3343031.3350881
– ident: ref35
  doi: 10.1007/s11263-016-0981-7
– ident: ref58
  doi: 10.1145/3474085.3475425
– ident: ref13
  doi: 10.1109/TPAMI.2021.3132034
– ident: ref6
  doi: 10.1145/3219819.3219861
– ident: ref24
  doi: 10.1109/CVPR.2019.00686
– ident: ref59
  doi: 10.1609/aaai.v34i07.6999
– ident: ref25
  doi: 10.1007/978-3-030-58558-7_40
– ident: ref50
  doi: 10.1145/3394171.3413753
– ident: ref42
  doi: 10.1109/ICCV.2013.378
– ident: ref61
  doi: 10.24963/ijcai.2020/153
– ident: ref56
  doi: 10.1007/978-3-030-01264-9_42
– ident: ref55
  doi: 10.1109/CVPR46437.2021.00864
– ident: ref15
  doi: 10.1109/TIP.2021.3051756
– ident: ref52
  doi: 10.1109/CVPR42600.2020.01014
– start-page: 1
  year: 2018
  ident: ref49
  article-title: Graph attention networks
  publication-title: Proc ICLR
– start-page: 1563
  year: 2019
  ident: ref4
  article-title: ICDAR 2019 competition on scene text visual question answering
  publication-title: Proc of Int Conf Document Analysis Recognition (ICDAR)
– ident: ref40
  doi: 10.24963/ijcai.2019/423
– ident: ref18
  doi: 10.1609/aaai.v32i1.12235
– ident: ref11
  doi: 10.1007/s11263-009-0275-4
– ident: ref39
  doi: 10.1145/3394171.3413924
– ident: ref53
  doi: 10.1007/s11432-021-3383-y
– ident: ref3
  doi: 10.1109/ICCV.2019.00439
– ident: ref17
  doi: 10.24963/ijcai.2019/105
– year: 2016
  ident: ref48
  article-title: COCO-Text: Dataset and benchmark for text detection and recognition in natural images
  publication-title: arXiv 1601 07140
– ident: ref46
  doi: 10.1109/CVPR.2019.00851
– volume: 10
  start-page: 707
  year: 1966
  ident: ref36
  article-title: Binary codes capable of correcting deletions, insertions, and reversals
  publication-title: Sov Phys Doklady
– start-page: 5998
  year: 2017
  ident: ref47
  article-title: Attention is all you need
  publication-title: Proc NeurIPS
– ident: ref44
  doi: 10.1109/CVPR.2019.00075
– ident: ref19
  doi: 10.1109/TIP.2021.3097180
– ident: ref60
  doi: 10.1609/aaai.v35i4.16476
– ident: ref2
  doi: 10.1109/ICCV.2019.00481
– ident: ref22
  doi: 10.1109/CVPR42600.2020.01001
– start-page: 4171
  year: 2019
  ident: ref10
  article-title: Bert: Pre-training of deep bidirectional transformers for language understanding
  publication-title: Proc NAACL
– ident: ref1
  doi: 10.1109/TPAMI.2014.2339814
– ident: ref41
  doi: 10.1109/ICCVW54120.2021.00297
– ident: ref54
  doi: 10.1109/TPAMI.2020.2973983
– ident: ref20
  doi: 10.1109/CVPR.2018.00380
– ident: ref32
  doi: 10.1109/ICDAR.2013.221
– ident: ref23
  doi: 10.18653/v1/2020.acl-main.642
– ident: ref14
  doi: 10.1109/CVPR42600.2020.01276
– ident: ref5
  doi: 10.1162/tacl_a_00051
– ident: ref26
  doi: 10.1109/TIP.2020.3004729
– ident: ref9
  doi: 10.1109/CVPR.2009.5206848
– ident: ref38
  doi: 10.1109/TIP.2022.3181516
– ident: ref21
  doi: 10.18653/v1/2020.coling-main.278
– ident: ref37
  doi: 10.1109/ICCV.2019.01041
– ident: ref51
  doi: 10.1109/CVPR46437.2021.00136
– volume: 25
  start-page: 1
  year: 2021
  ident: ref28
  article-title: RUArt: A novel text-centered solution for text-based visual question answering
  publication-title: IEEE TMM
– ident: ref57
  doi: 10.1145/3474085.3475606
– ident: ref8
  doi: 10.1109/CVPR42600.2020.01211
– ident: ref31
  doi: 10.1109/ICDAR.2015.7333942
– year: 2017
  ident: ref34
  publication-title: Openimages A public dataset for large-scale multi-label and multi-class image classification
– ident: ref45
  doi: 10.1007/978-3-030-58536-5_44
– year: 2018
  ident: ref27
  article-title: Pythia v0.1: The winning entry to the VQA challenge 2018
  publication-title: arXiv 1807 09956
– ident: ref33
  doi: 10.1109/CVPR42600.2020.01459
– ident: ref30
  doi: 10.1007/978-3-030-58545-7_41
– ident: ref29
  doi: 10.1109/CVPR.2018.00592
– ident: ref7
  doi: 10.18653/v1/2021.findings-acl.20
SSID ssj0014516
Score 2.481644
Snippet Text-based visual question answering (TextVQA) faces the significant challenge of avoiding redundant relational inference. To be specific, a large number of...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 5060
SubjectTerms Cognition
Computational modeling
graph inference
Inference
Learning
Optical character recognition
Predictions
Pruning
Question answering (information retrieval)
Reasoning
relation learning
spatial relation
Task analysis
text-based visual question answering
Transformers
Visual observation
Visual question answering
Visualization
Title Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA
URI https://ieeexplore.ieee.org/document/10241306
https://www.proquest.com/docview/2864341532
https://www.proquest.com/docview/2861643292
Volume 32
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8QwEB7Ukx58i6urRPDiobWbtGlzXMUnKIqreCtpkoK4dEV3L_56Z9J2FUXx0haalDSTyXyTeQHsG0p_KQsbWIGXuOAiyHipgtTFJjLGJlKTRffqWp7fx5ePyWMTrO5jYZxz3vnMhfTobfl2ZCZ0VIYc7q1AchZmUXOrg7WmJgOqOOtNm0kapIj7W5tkpA4HFzchlQkPBVXVElTBBvlKqp6vt_Ipjnx9lR-bspc0p0tw3Y6xdjB5DifjIjTv39I3_vsnlmGxwZysXy-SFZhx1SosNfiTNdz9tgoLX5ITrsHZ1D2P3b2g-uvohvvBkLUOdOypYmeU8JpdtGGDDDEwG5A2fYTi0bKH2_463J-eDI7Pg6bsQmAET8aB8bYYBI681FrHqctQypVSx1Y6ivlQuPhUz6Boy5RRPFLCJtqVqUyNLawQYgPmqlHlNoEVSiM8K3smSXRsSq6szIzUMnFFYaWOOnDYzn5umpzkVBpjmHvdJFI5ki4n0uUN6TpwMO3xUufj-KPtOk3_l3b1zHeg21I4bzj2LecZYjNEM9Rtb_oaeY0MKLpyo4lvg9ql4Ipv_fLpbZinEdRnNF2YG79O3A6ilnGx61frB0UX5J4
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT9wwEB6120PLAVpKxfJoXakXDglZO3bi44KAXR6rVl0qbpFjO1IFyiLYvfDrmXGSLWoF6iWJFDtyPB7PN54XwDdL6S9V6SIn8JKWXEQ5r3SU-dQm1jqpDFl0LyZqdJmeXsmrNlg9xMJ474PzmY_pMdjy3cwu6KgMOTxYgdRreCMpGrcJ11oaDajmbDBuyizKEPl3VslE70_H32MqFB4LqqslqIYNcpbSg1Bx5Y9AChVW_tmWg6w5XoNJN8rGxeQ6XszL2D78lcDxv3_jPay2qJMNm2XyAV75eh3WWgTKWv6-X4eVJ-kJP8LJ0kGP_bxFBdjTDXeEG9a50LHfNTuhlNds3AUOMkTBbEr69AEKSMd-_RhuwOXx0fRwFLWFFyIruJxHNlhjEDryyhiTZj5HOVcpkzrlKepD4_LTA4vCLddW80QLJ42vMpVZVzohxCfo1bPabwIrtUGAVg2slCa1FddO5VYZJX1ZOmWSPux3s1_YNis5Fce4KYJ2kugCSVcQ6YqWdH3YW_a4bTJyvNB2g6b_Sbtm5vuw01G4aHn2vuA5ojPEM9Tt6_I1chuZUEztZ4vQBvVLwTXfeubTX-DtaHpxXpyPJ2fb8I5G05zY7EBvfrfwu4hh5uXnsHIfAVui5-Y
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Exploring+Sparse+Spatial+Relation+in+Graph+Inference+for+Text-Based+VQA&rft.jtitle=IEEE+transactions+on+image+processing&rft.au=Zhou%2C+Sheng&rft.au=Guo%2C+Dan&rft.au=Li%2C+Jia&rft.au=Yang%2C+Xun&rft.date=2023&rft.issn=1057-7149&rft.eissn=1941-0042&rft.volume=32&rft.spage=5060&rft.epage=5074&rft_id=info:doi/10.1109%2FTIP.2023.3310332&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TIP_2023_3310332
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1057-7149&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1057-7149&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1057-7149&client=summon