Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA

Text-based visual question answering (TextVQA) faces the significant challenge of avoiding redundant relational inference. To be specific, a large number of detected objects and optical character recognition (OCR) tokens result in rich visual relationships. Existing works take all visual relationshi...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on image processing Vol. 32; pp. 5060 - 5074
Main Authors	Zhou, Sheng, Guo, Dan, Li, Jia, Yang, Xun, Wang, Meng
Format	Journal Article
Language	English
Published	New York IEEE 2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Cognition Computational modeling graph inference Inference Learning Optical character recognition Predictions Pruning Question answering (information retrieval) Reasoning relation learning spatial relation Task analysis text-based visual question answering Transformers Visual observation Visual question answering Visualization
Online Access	Get full text
ISSN	1057-7149 1941-0042 1941-0042
DOI	10.1109/TIP.2023.3310332

Cover

Loading…

Abstract	Text-based visual question answering (TextVQA) faces the significant challenge of avoiding redundant relational inference. To be specific, a large number of detected objects and optical character recognition (OCR) tokens result in rich visual relationships. Existing works take all visual relationships into account for answer prediction. However, there are three observations: (1) a single subject in the images can be easily detected as multiple objects with distinct bounding boxes (considered repetitive objects). The associations between these repetitive objects are superfluous for answer reasoning; (2) two spatially distant OCR tokens detected in the image frequently have weak semantic dependencies for answer reasoning; and (3) the co-existence of nearby objects and tokens may be indicative of important visual cues for predicting answers. Rather than utilizing all of them for answer prediction, we make an effort to identify the most important connections or eliminate redundant ones. We propose a sparse spatial graph network (SSGN) that introduces a spatially aware relation pruning technique to this task. As spatial factors for relation measurement, we employ spatial distance, geometric dimension, overlap area, and DIoU for spatially aware pruning. We consider three visual relationships for graph learning: object-object, OCR-OCR tokens, and object-OCR token relationships. SSGN is a progressive graph learning architecture that verifies the pivotal relations in the correlated object-token sparse graph, and then in the respective object-based sparse graph and token-based sparse graph. Experiment results on TextVQA and ST-VQA datasets demonstrate that SSGN achieves promising performances. And some visualization results further demonstrate the interpretability of our method.
AbstractList	Text-based visual question answering (TextVQA) faces the significant challenge of avoiding redundant relational inference. To be specific, a large number of detected objects and optical character recognition (OCR) tokens result in rich visual relationships. Existing works take all visual relationships into account for answer prediction. However, there are three observations: (1) a single subject in the images can be easily detected as multiple objects with distinct bounding boxes (considered repetitive objects). The associations between these repetitive objects are superfluous for answer reasoning; (2) two spatially distant OCR tokens detected in the image frequently have weak semantic dependencies for answer reasoning; and (3) the co-existence of nearby objects and tokens may be indicative of important visual cues for predicting answers. Rather than utilizing all of them for answer prediction, we make an effort to identify the most important connections or eliminate redundant ones. We propose a sparse spatial graph network (SSGN) that introduces a spatially aware relation pruning technique to this task. As spatial factors for relation measurement, we employ spatial distance, geometric dimension, overlap area, and DIoU for spatially aware pruning. We consider three visual relationships for graph learning: object-object, OCR-OCR tokens, and object-OCR token relationships. SSGN is a progressive graph learning architecture that verifies the pivotal relations in the correlated object-token sparse graph, and then in the respective object-based sparse graph and token-based sparse graph. Experiment results on TextVQA and ST-VQA datasets demonstrate that SSGN achieves promising performances. And some visualization results further demonstrate the interpretability of our method. Text-based visual question answering (TextVQA) faces the significant challenge of avoiding redundant relational inference. To be specific, a large number of detected objects and optical character recognition (OCR) tokens result in rich visual relationships. Existing works take all visual relationships into account for answer prediction. However, there are three observations: (1) a single subject in the images can be easily detected as multiple objects with distinct bounding boxes (considered repetitive objects). The associations between these repetitive objects are superfluous for answer reasoning; (2) two spatially distant OCR tokens detected in the image frequently have weak semantic dependencies for answer reasoning; and (3) the co-existence of nearby objects and tokens may be indicative of important visual cues for predicting answers. Rather than utilizing all of them for answer prediction, we make an effort to identify the most important connections or eliminate redundant ones. We propose a sparse spatial graph network (SSGN) that introduces a spatially aware relation pruning technique to this task. As spatial factors for relation measurement, we employ spatial distance, geometric dimension, overlap area, and DIoU for spatially aware pruning. We consider three visual relationships for graph learning: object-object, OCR-OCR tokens, and object-OCR token relationships. SSGN is a progressive graph learning architecture that verifies the pivotal relations in the correlated object-token sparse graph, and then in the respective object-based sparse graph and token-based sparse graph. Experiment results on TextVQA and ST-VQA datasets demonstrate that SSGN achieves promising performances. And some visualization results further demonstrate the interpretability of our method.Text-based visual question answering (TextVQA) faces the significant challenge of avoiding redundant relational inference. To be specific, a large number of detected objects and optical character recognition (OCR) tokens result in rich visual relationships. Existing works take all visual relationships into account for answer prediction. However, there are three observations: (1) a single subject in the images can be easily detected as multiple objects with distinct bounding boxes (considered repetitive objects). The associations between these repetitive objects are superfluous for answer reasoning; (2) two spatially distant OCR tokens detected in the image frequently have weak semantic dependencies for answer reasoning; and (3) the co-existence of nearby objects and tokens may be indicative of important visual cues for predicting answers. Rather than utilizing all of them for answer prediction, we make an effort to identify the most important connections or eliminate redundant ones. We propose a sparse spatial graph network (SSGN) that introduces a spatially aware relation pruning technique to this task. As spatial factors for relation measurement, we employ spatial distance, geometric dimension, overlap area, and DIoU for spatially aware pruning. We consider three visual relationships for graph learning: object-object, OCR-OCR tokens, and object-OCR token relationships. SSGN is a progressive graph learning architecture that verifies the pivotal relations in the correlated object-token sparse graph, and then in the respective object-based sparse graph and token-based sparse graph. Experiment results on TextVQA and ST-VQA datasets demonstrate that SSGN achieves promising performances. And some visualization results further demonstrate the interpretability of our method.
Author	Zhou, Sheng Li, Jia Wang, Meng Yang, Xun Guo, Dan
Author_xml	– sequence: 1 givenname: Sheng orcidid: 0009-0007-4215-5464 surname: Zhou fullname: Zhou, Sheng email: hzgn97@gmail.com organization: School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China – sequence: 2 givenname: Dan orcidid: 0000-0003-2594-254X surname: Guo fullname: Guo, Dan email: guodan@hfut.edu.cn organization: School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China – sequence: 3 givenname: Jia orcidid: 0000-0001-9446-249X surname: Li fullname: Li, Jia email: jiali@hfut.edu.cn organization: School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China – sequence: 4 givenname: Xun orcidid: 0000-0003-0201-1638 surname: Yang fullname: Yang, Xun email: xyang21@ustc.edu.cn organization: School of Information Science and Technology, University of Science and Technology of China, Hefei, China – sequence: 5 givenname: Meng orcidid: 0000-0002-3094-7735 surname: Wang fullname: Wang, Meng email: eric.mengwang@gmail.com organization: School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China
BookMark	eNp9kD1PwzAQhi0E4ntnYIjEwpLi8zlOPJaqlEpIfBXWyDgXCEqdYKcS_HsS2gExMN07PO_d6Tlg265xxNgJ8BEA1xeL-d1IcIEjROCIYovtg5YQcy7Fdp95ksYpSL3HDkJ45xxkAmqX7WGqlIYs22ez6WdbN75yr9Fja3ygYXSVqaMHqvvQuKhy0cyb9i2au5I8OUtR2fhoQZ9dfGkCFdHz_fiI7ZSmDnS8mYfs6Wq6mFzHN7ez-WR8E1sUSRdblaQJaK1FaYyRKWUcVamMLBSlGSpNptBgMw6ZtlpwjUViqExVaouXAhEP2fl6b-ubjxWFLl9WwVJdG0fNKuQiU6AkCi169OwP-t6svOu_GyiJEhIcKL6mrG9C8FTmra-Wxn_lwPNBct5LzgfJ-UZyX1F_Krbqflx13lT1f8XTdbEiol93hATkCr8B6z-HCA
CODEN	IIPRE4
CitedBy_id	crossref_primary_10_1007_s11432_023_4084_6 crossref_primary_10_1016_j_eswa_2024_124785 crossref_primary_10_1016_j_neucom_2025_129999 crossref_primary_10_1007_s40747_024_01602_0 crossref_primary_10_1007_s11263_024_02106_7 crossref_primary_10_1109_TGRS_2024_3502800 crossref_primary_10_1109_TIP_2024_3468884 crossref_primary_10_1007_s00530_024_01394_w crossref_primary_10_1016_j_neucom_2025_129906 crossref_primary_10_1109_TMM_2024_3385997 crossref_primary_10_1007_s00138_024_01645_w
Cites_doi	10.1109/CVPR46437.2021.00702 10.1109/TPAMI.2016.2577031 10.1145/3343031.3350881 10.1007/s11263-016-0981-7 10.1145/3474085.3475425 10.1109/TPAMI.2021.3132034 10.1145/3219819.3219861 10.1109/CVPR.2019.00686 10.1609/aaai.v34i07.6999 10.1007/978-3-030-58558-7_40 10.1145/3394171.3413753 10.1109/ICCV.2013.378 10.24963/ijcai.2020/153 10.1007/978-3-030-01264-9_42 10.1109/CVPR46437.2021.00864 10.1109/TIP.2021.3051756 10.1109/CVPR42600.2020.01014 10.24963/ijcai.2019/423 10.1609/aaai.v32i1.12235 10.1007/s11263-009-0275-4 10.1145/3394171.3413924 10.1007/s11432-021-3383-y 10.1109/ICCV.2019.00439 10.24963/ijcai.2019/105 10.1109/CVPR.2019.00851 10.1109/CVPR.2019.00075 10.1109/TIP.2021.3097180 10.1609/aaai.v35i4.16476 10.1109/ICCV.2019.00481 10.1109/CVPR42600.2020.01001 10.1109/TPAMI.2014.2339814 10.1109/ICCVW54120.2021.00297 10.1109/TPAMI.2020.2973983 10.1109/CVPR.2018.00380 10.1109/ICDAR.2013.221 10.18653/v1/2020.acl-main.642 10.1109/CVPR42600.2020.01276 10.1162/tacl_a_00051 10.1109/TIP.2020.3004729 10.1109/CVPR.2009.5206848 10.1109/TIP.2022.3181516 10.18653/v1/2020.coling-main.278 10.1109/ICCV.2019.01041 10.1109/CVPR46437.2021.00136 10.1145/3474085.3475606 10.1109/CVPR42600.2020.01211 10.1109/ICDAR.2015.7333942 10.1007/978-3-030-58536-5_44 10.1109/CVPR42600.2020.01459 10.1007/978-3-030-58545-7_41 10.1109/CVPR.2018.00592 10.18653/v1/2021.findings-acl.20
ContentType	Journal Article
Copyright	Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023
Copyright_xml	– notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023
DBID	97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D 7X8
DOI	10.1109/TIP.2023.3310332
DatabaseName	IEEE Xplore (IEEE) IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional MEDLINE - Academic
DatabaseTitle	CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional MEDLINE - Academic
DatabaseTitleList	Technology Research Database MEDLINE - Academic
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Applied Sciences Engineering
EISSN	1941-0042
EndPage	5074
ExternalDocumentID	10_1109_TIP_2023_3310332 10241306
Genre	orig-research
GrantInformation_xml	– fundername: National Natural Science Foundation of China grantid: 62020106007; 62272144; U20A20183; 72188101; 62272435; U22A2094; 62202139 funderid: 10.13039/501100001809 – fundername: Major Project of Anhui Province grantid: 202203a05020011 – fundername: National Key Research and Development Program of China grantid: 2022YFB4500600 funderid: 10.13039/501100012166
GroupedDBID	--- -~X .DC 0R~ 29I 4.4 53G 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABFSI ABQJQ ABVLG ACGFO ACGFS ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD F5P HZ~ H~9 ICLAB IFIPE IFJZH IPLJI JAVBF LAI M43 MS~ O9- OCL P2P RIA RIE RNS TAE TN5 VH1 AAYOK AAYXX CITATION RIG 7SC 7SP 8FD JQ2 L7M L~C L~D 7X8
ID	FETCH-LOGICAL-c325t-c657519992faaa47e8036f6a4d6e78369ead91c80189c92093d5aef767cdbd333
IEDL.DBID	RIE
ISSN	1057-7149 1941-0042
IngestDate	Thu Jul 10 19:58:41 EDT 2025 Mon Jun 30 10:13:00 EDT 2025 Thu Apr 24 23:01:41 EDT 2025 Tue Jul 01 02:18:57 EDT 2025 Wed Aug 27 02:51:02 EDT 2025
IsPeerReviewed	true
IsScholarly	true
Language	English
License	https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c325t-c657519992faaa47e8036f6a4d6e78369ead91c80189c92093d5aef767cdbd333
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ORCID	0009-0007-4215-5464 0000-0002-3094-7735 0000-0003-2594-254X 0000-0001-9446-249X 0000-0003-0201-1638
PMID	37669188
PQID	2864341532
PQPubID	85429
PageCount	15
ParticipantIDs	proquest_miscellaneous_2861643292 ieee_primary_10241306 crossref_citationtrail_10_1109_TIP_2023_3310332 proquest_journals_2864341532 crossref_primary_10_1109_TIP_2023_3310332
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	20230000 2023-00-00 20230101
PublicationDateYYYYMMDD	2023-01-01
PublicationDate_xml	– year: 2023 text: 20230000
PublicationDecade	2020
PublicationPlace	New York
PublicationPlace_xml	– name: New York
PublicationTitle	IEEE transactions on image processing
PublicationTitleAbbrev	TIP
PublicationYear	2023
Publisher	IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml	– name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References	ref13 ref57 devlin (ref10) 2019 ref12 ref56 ref15 ref59 ref14 ref58 ref53 ref52 ref11 ref55 ref54 ref17 ref16 ref19 ref18 ref51 ref50 ref46 levenshtein (ref36) 1966; 10 ref45 ref42 ref41 ref44 ref43 jin (ref28) 2021; 25 krasin (ref34) 2017 ref8 ref7 ref9 ref3 ref6 ref5 veli?kovi? (ref49) 2018 biten (ref4) 2019 ref40 ref35 ref37 ref31 ref30 ref33 ref32 ref2 ref1 ref39 ref38 vaswani (ref47) 2017 veit (ref48) 2016 ref24 ref23 ref26 ref25 ref20 ref22 jiang (ref27) 2018 ref21 ref29 ref60 ref61
References_xml	– ident: ref12 doi: 10.1109/CVPR46437.2021.00702 – ident: ref43 doi: 10.1109/TPAMI.2016.2577031 – ident: ref16 doi: 10.1145/3343031.3350881 – ident: ref35 doi: 10.1007/s11263-016-0981-7 – ident: ref58 doi: 10.1145/3474085.3475425 – ident: ref13 doi: 10.1109/TPAMI.2021.3132034 – ident: ref6 doi: 10.1145/3219819.3219861 – ident: ref24 doi: 10.1109/CVPR.2019.00686 – ident: ref59 doi: 10.1609/aaai.v34i07.6999 – ident: ref25 doi: 10.1007/978-3-030-58558-7_40 – ident: ref50 doi: 10.1145/3394171.3413753 – ident: ref42 doi: 10.1109/ICCV.2013.378 – ident: ref61 doi: 10.24963/ijcai.2020/153 – ident: ref56 doi: 10.1007/978-3-030-01264-9_42 – ident: ref55 doi: 10.1109/CVPR46437.2021.00864 – ident: ref15 doi: 10.1109/TIP.2021.3051756 – ident: ref52 doi: 10.1109/CVPR42600.2020.01014 – start-page: 1 year: 2018 ident: ref49 article-title: Graph attention networks publication-title: Proc ICLR – start-page: 1563 year: 2019 ident: ref4 article-title: ICDAR 2019 competition on scene text visual question answering publication-title: Proc of Int Conf Document Analysis Recognition (ICDAR) – ident: ref40 doi: 10.24963/ijcai.2019/423 – ident: ref18 doi: 10.1609/aaai.v32i1.12235 – ident: ref11 doi: 10.1007/s11263-009-0275-4 – ident: ref39 doi: 10.1145/3394171.3413924 – ident: ref53 doi: 10.1007/s11432-021-3383-y – ident: ref3 doi: 10.1109/ICCV.2019.00439 – ident: ref17 doi: 10.24963/ijcai.2019/105 – year: 2016 ident: ref48 article-title: COCO-Text: Dataset and benchmark for text detection and recognition in natural images publication-title: arXiv 1601 07140 – ident: ref46 doi: 10.1109/CVPR.2019.00851 – volume: 10 start-page: 707 year: 1966 ident: ref36 article-title: Binary codes capable of correcting deletions, insertions, and reversals publication-title: Sov Phys Doklady – start-page: 5998 year: 2017 ident: ref47 article-title: Attention is all you need publication-title: Proc NeurIPS – ident: ref44 doi: 10.1109/CVPR.2019.00075 – ident: ref19 doi: 10.1109/TIP.2021.3097180 – ident: ref60 doi: 10.1609/aaai.v35i4.16476 – ident: ref2 doi: 10.1109/ICCV.2019.00481 – ident: ref22 doi: 10.1109/CVPR42600.2020.01001 – start-page: 4171 year: 2019 ident: ref10 article-title: Bert: Pre-training of deep bidirectional transformers for language understanding publication-title: Proc NAACL – ident: ref1 doi: 10.1109/TPAMI.2014.2339814 – ident: ref41 doi: 10.1109/ICCVW54120.2021.00297 – ident: ref54 doi: 10.1109/TPAMI.2020.2973983 – ident: ref20 doi: 10.1109/CVPR.2018.00380 – ident: ref32 doi: 10.1109/ICDAR.2013.221 – ident: ref23 doi: 10.18653/v1/2020.acl-main.642 – ident: ref14 doi: 10.1109/CVPR42600.2020.01276 – ident: ref5 doi: 10.1162/tacl_a_00051 – ident: ref26 doi: 10.1109/TIP.2020.3004729 – ident: ref9 doi: 10.1109/CVPR.2009.5206848 – ident: ref38 doi: 10.1109/TIP.2022.3181516 – ident: ref21 doi: 10.18653/v1/2020.coling-main.278 – ident: ref37 doi: 10.1109/ICCV.2019.01041 – ident: ref51 doi: 10.1109/CVPR46437.2021.00136 – volume: 25 start-page: 1 year: 2021 ident: ref28 article-title: RUArt: A novel text-centered solution for text-based visual question answering publication-title: IEEE TMM – ident: ref57 doi: 10.1145/3474085.3475606 – ident: ref8 doi: 10.1109/CVPR42600.2020.01211 – ident: ref31 doi: 10.1109/ICDAR.2015.7333942 – year: 2017 ident: ref34 publication-title: Openimages A public dataset for large-scale multi-label and multi-class image classification – ident: ref45 doi: 10.1007/978-3-030-58536-5_44 – year: 2018 ident: ref27 article-title: Pythia v0.1: The winning entry to the VQA challenge 2018 publication-title: arXiv 1807 09956 – ident: ref33 doi: 10.1109/CVPR42600.2020.01459 – ident: ref30 doi: 10.1007/978-3-030-58545-7_41 – ident: ref29 doi: 10.1109/CVPR.2018.00592 – ident: ref7 doi: 10.18653/v1/2021.findings-acl.20
SSID	ssj0014516
Score	2.481644
Snippet	Text-based visual question answering (TextVQA) faces the significant challenge of avoiding redundant relational inference. To be specific, a large number of...
SourceID	proquest crossref ieee
SourceType	Aggregation Database Enrichment Source Index Database Publisher
StartPage	5060
SubjectTerms	Cognition Computational modeling graph inference Inference Learning Optical character recognition Predictions Pruning Question answering (information retrieval) Reasoning relation learning spatial relation Task analysis text-based visual question answering Transformers Visual observation Visual question answering Visualization
Title	Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA
URI	https://ieeexplore.ieee.org/document/10241306 https://www.proquest.com/docview/2864341532 https://www.proquest.com/docview/2861643292
Volume	32
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8QwEB7Ukx58i6urRPDiobWbtGlzXMUnKIqreCtpkoK4dEV3L_56Z9J2FUXx0haalDSTyXyTeQHsG0p_KQsbWIGXuOAiyHipgtTFJjLGJlKTRffqWp7fx5ePyWMTrO5jYZxz3vnMhfTobfl2ZCZ0VIYc7q1AchZmUXOrg7WmJgOqOOtNm0kapIj7W5tkpA4HFzchlQkPBVXVElTBBvlKqp6vt_Ipjnx9lR-bspc0p0tw3Y6xdjB5DifjIjTv39I3_vsnlmGxwZysXy-SFZhx1SosNfiTNdz9tgoLX5ITrsHZ1D2P3b2g-uvohvvBkLUOdOypYmeU8JpdtGGDDDEwG5A2fYTi0bKH2_463J-eDI7Pg6bsQmAET8aB8bYYBI681FrHqctQypVSx1Y6ivlQuPhUz6Boy5RRPFLCJtqVqUyNLawQYgPmqlHlNoEVSiM8K3smSXRsSq6szIzUMnFFYaWOOnDYzn5umpzkVBpjmHvdJFI5ki4n0uUN6TpwMO3xUufj-KPtOk3_l3b1zHeg21I4bzj2LecZYjNEM9Rtb_oaeY0MKLpyo4lvg9ql4Ipv_fLpbZinEdRnNF2YG79O3A6ilnGx61frB0UX5J4
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT9wwEB6120PLAVpKxfJoXakXDglZO3bi44KAXR6rVl0qbpFjO1IFyiLYvfDrmXGSLWoF6iWJFDtyPB7PN54XwDdL6S9V6SIn8JKWXEQ5r3SU-dQm1jqpDFl0LyZqdJmeXsmrNlg9xMJ474PzmY_pMdjy3cwu6KgMOTxYgdRreCMpGrcJ11oaDajmbDBuyizKEPl3VslE70_H32MqFB4LqqslqIYNcpbSg1Bx5Y9AChVW_tmWg6w5XoNJN8rGxeQ6XszL2D78lcDxv3_jPay2qJMNm2XyAV75eh3WWgTKWv6-X4eVJ-kJP8LJ0kGP_bxFBdjTDXeEG9a50LHfNTuhlNds3AUOMkTBbEr69AEKSMd-_RhuwOXx0fRwFLWFFyIruJxHNlhjEDryyhiTZj5HOVcpkzrlKepD4_LTA4vCLddW80QLJ42vMpVZVzohxCfo1bPabwIrtUGAVg2slCa1FddO5VYZJX1ZOmWSPux3s1_YNis5Fce4KYJ2kugCSVcQ6YqWdH3YW_a4bTJyvNB2g6b_Sbtm5vuw01G4aHn2vuA5ojPEM9Tt6_I1chuZUEztZ4vQBvVLwTXfeubTX-DtaHpxXpyPJ2fb8I5G05zY7EBvfrfwu4hh5uXnsHIfAVui5-Y
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Exploring+Sparse+Spatial+Relation+in+Graph+Inference+for+Text-Based+VQA&rft.jtitle=IEEE+transactions+on+image+processing&rft.au=Zhou%2C+Sheng&rft.au=Guo%2C+Dan&rft.au=Li%2C+Jia&rft.au=Yang%2C+Xun&rft.date=2023&rft.issn=1057-7149&rft.eissn=1941-0042&rft.volume=32&rft.spage=5060&rft.epage=5074&rft_id=info:doi/10.1109%2FTIP.2023.3310332&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TIP_2023_3310332
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1057-7149&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1057-7149&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1057-7149&client=summon