Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA
Text-based visual question answering (TextVQA) faces the significant challenge of avoiding redundant relational inference. To be specific, a large number of detected objects and optical character recognition (OCR) tokens result in rich visual relationships. Existing works take all visual relationshi...
Saved in:
Published in | IEEE transactions on image processing Vol. 32; pp. 5060 - 5074 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
ISSN | 1057-7149 1941-0042 1941-0042 |
DOI | 10.1109/TIP.2023.3310332 |
Cover
Loading…
Abstract | Text-based visual question answering (TextVQA) faces the significant challenge of avoiding redundant relational inference. To be specific, a large number of detected objects and optical character recognition (OCR) tokens result in rich visual relationships. Existing works take all visual relationships into account for answer prediction. However, there are three observations: (1) a single subject in the images can be easily detected as multiple objects with distinct bounding boxes (considered repetitive objects). The associations between these repetitive objects are superfluous for answer reasoning; (2) two spatially distant OCR tokens detected in the image frequently have weak semantic dependencies for answer reasoning; and (3) the co-existence of nearby objects and tokens may be indicative of important visual cues for predicting answers. Rather than utilizing all of them for answer prediction, we make an effort to identify the most important connections or eliminate redundant ones. We propose a sparse spatial graph network (SSGN) that introduces a spatially aware relation pruning technique to this task. As spatial factors for relation measurement, we employ spatial distance, geometric dimension, overlap area, and DIoU for spatially aware pruning. We consider three visual relationships for graph learning: object-object, OCR-OCR tokens, and object-OCR token relationships. SSGN is a progressive graph learning architecture that verifies the pivotal relations in the correlated object-token sparse graph, and then in the respective object-based sparse graph and token-based sparse graph. Experiment results on TextVQA and ST-VQA datasets demonstrate that SSGN achieves promising performances. And some visualization results further demonstrate the interpretability of our method. |
---|---|
AbstractList | Text-based visual question answering (TextVQA) faces the significant challenge of avoiding redundant relational inference. To be specific, a large number of detected objects and optical character recognition (OCR) tokens result in rich visual relationships. Existing works take all visual relationships into account for answer prediction. However, there are three observations: (1) a single subject in the images can be easily detected as multiple objects with distinct bounding boxes (considered repetitive objects). The associations between these repetitive objects are superfluous for answer reasoning; (2) two spatially distant OCR tokens detected in the image frequently have weak semantic dependencies for answer reasoning; and (3) the co-existence of nearby objects and tokens may be indicative of important visual cues for predicting answers. Rather than utilizing all of them for answer prediction, we make an effort to identify the most important connections or eliminate redundant ones. We propose a sparse spatial graph network (SSGN) that introduces a spatially aware relation pruning technique to this task. As spatial factors for relation measurement, we employ spatial distance, geometric dimension, overlap area, and DIoU for spatially aware pruning. We consider three visual relationships for graph learning: object-object, OCR-OCR tokens, and object-OCR token relationships. SSGN is a progressive graph learning architecture that verifies the pivotal relations in the correlated object-token sparse graph, and then in the respective object-based sparse graph and token-based sparse graph. Experiment results on TextVQA and ST-VQA datasets demonstrate that SSGN achieves promising performances. And some visualization results further demonstrate the interpretability of our method. Text-based visual question answering (TextVQA) faces the significant challenge of avoiding redundant relational inference. To be specific, a large number of detected objects and optical character recognition (OCR) tokens result in rich visual relationships. Existing works take all visual relationships into account for answer prediction. However, there are three observations: (1) a single subject in the images can be easily detected as multiple objects with distinct bounding boxes (considered repetitive objects). The associations between these repetitive objects are superfluous for answer reasoning; (2) two spatially distant OCR tokens detected in the image frequently have weak semantic dependencies for answer reasoning; and (3) the co-existence of nearby objects and tokens may be indicative of important visual cues for predicting answers. Rather than utilizing all of them for answer prediction, we make an effort to identify the most important connections or eliminate redundant ones. We propose a sparse spatial graph network (SSGN) that introduces a spatially aware relation pruning technique to this task. As spatial factors for relation measurement, we employ spatial distance, geometric dimension, overlap area, and DIoU for spatially aware pruning. We consider three visual relationships for graph learning: object-object, OCR-OCR tokens, and object-OCR token relationships. SSGN is a progressive graph learning architecture that verifies the pivotal relations in the correlated object-token sparse graph, and then in the respective object-based sparse graph and token-based sparse graph. Experiment results on TextVQA and ST-VQA datasets demonstrate that SSGN achieves promising performances. And some visualization results further demonstrate the interpretability of our method.Text-based visual question answering (TextVQA) faces the significant challenge of avoiding redundant relational inference. To be specific, a large number of detected objects and optical character recognition (OCR) tokens result in rich visual relationships. Existing works take all visual relationships into account for answer prediction. However, there are three observations: (1) a single subject in the images can be easily detected as multiple objects with distinct bounding boxes (considered repetitive objects). The associations between these repetitive objects are superfluous for answer reasoning; (2) two spatially distant OCR tokens detected in the image frequently have weak semantic dependencies for answer reasoning; and (3) the co-existence of nearby objects and tokens may be indicative of important visual cues for predicting answers. Rather than utilizing all of them for answer prediction, we make an effort to identify the most important connections or eliminate redundant ones. We propose a sparse spatial graph network (SSGN) that introduces a spatially aware relation pruning technique to this task. As spatial factors for relation measurement, we employ spatial distance, geometric dimension, overlap area, and DIoU for spatially aware pruning. We consider three visual relationships for graph learning: object-object, OCR-OCR tokens, and object-OCR token relationships. SSGN is a progressive graph learning architecture that verifies the pivotal relations in the correlated object-token sparse graph, and then in the respective object-based sparse graph and token-based sparse graph. Experiment results on TextVQA and ST-VQA datasets demonstrate that SSGN achieves promising performances. And some visualization results further demonstrate the interpretability of our method. |
Author | Zhou, Sheng Li, Jia Wang, Meng Yang, Xun Guo, Dan |
Author_xml | – sequence: 1 givenname: Sheng orcidid: 0009-0007-4215-5464 surname: Zhou fullname: Zhou, Sheng email: hzgn97@gmail.com organization: School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China – sequence: 2 givenname: Dan orcidid: 0000-0003-2594-254X surname: Guo fullname: Guo, Dan email: guodan@hfut.edu.cn organization: School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China – sequence: 3 givenname: Jia orcidid: 0000-0001-9446-249X surname: Li fullname: Li, Jia email: jiali@hfut.edu.cn organization: School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China – sequence: 4 givenname: Xun orcidid: 0000-0003-0201-1638 surname: Yang fullname: Yang, Xun email: xyang21@ustc.edu.cn organization: School of Information Science and Technology, University of Science and Technology of China, Hefei, China – sequence: 5 givenname: Meng orcidid: 0000-0002-3094-7735 surname: Wang fullname: Wang, Meng email: eric.mengwang@gmail.com organization: School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China |
BookMark | eNp9kD1PwzAQhi0E4ntnYIjEwpLi8zlOPJaqlEpIfBXWyDgXCEqdYKcS_HsS2gExMN07PO_d6Tlg265xxNgJ8BEA1xeL-d1IcIEjROCIYovtg5YQcy7Fdp95ksYpSL3HDkJ45xxkAmqX7WGqlIYs22ez6WdbN75yr9Fja3ygYXSVqaMHqvvQuKhy0cyb9i2au5I8OUtR2fhoQZ9dfGkCFdHz_fiI7ZSmDnS8mYfs6Wq6mFzHN7ez-WR8E1sUSRdblaQJaK1FaYyRKWUcVamMLBSlGSpNptBgMw6ZtlpwjUViqExVaouXAhEP2fl6b-ubjxWFLl9WwVJdG0fNKuQiU6AkCi169OwP-t6svOu_GyiJEhIcKL6mrG9C8FTmra-Wxn_lwPNBct5LzgfJ-UZyX1F_Krbqflx13lT1f8XTdbEiol93hATkCr8B6z-HCA |
CODEN | IIPRE4 |
CitedBy_id | crossref_primary_10_1007_s11432_023_4084_6 crossref_primary_10_1016_j_eswa_2024_124785 crossref_primary_10_1016_j_neucom_2025_129999 crossref_primary_10_1007_s40747_024_01602_0 crossref_primary_10_1007_s11263_024_02106_7 crossref_primary_10_1109_TGRS_2024_3502800 crossref_primary_10_1109_TIP_2024_3468884 crossref_primary_10_1007_s00530_024_01394_w crossref_primary_10_1016_j_neucom_2025_129906 crossref_primary_10_1109_TMM_2024_3385997 crossref_primary_10_1007_s00138_024_01645_w |
Cites_doi | 10.1109/CVPR46437.2021.00702 10.1109/TPAMI.2016.2577031 10.1145/3343031.3350881 10.1007/s11263-016-0981-7 10.1145/3474085.3475425 10.1109/TPAMI.2021.3132034 10.1145/3219819.3219861 10.1109/CVPR.2019.00686 10.1609/aaai.v34i07.6999 10.1007/978-3-030-58558-7_40 10.1145/3394171.3413753 10.1109/ICCV.2013.378 10.24963/ijcai.2020/153 10.1007/978-3-030-01264-9_42 10.1109/CVPR46437.2021.00864 10.1109/TIP.2021.3051756 10.1109/CVPR42600.2020.01014 10.24963/ijcai.2019/423 10.1609/aaai.v32i1.12235 10.1007/s11263-009-0275-4 10.1145/3394171.3413924 10.1007/s11432-021-3383-y 10.1109/ICCV.2019.00439 10.24963/ijcai.2019/105 10.1109/CVPR.2019.00851 10.1109/CVPR.2019.00075 10.1109/TIP.2021.3097180 10.1609/aaai.v35i4.16476 10.1109/ICCV.2019.00481 10.1109/CVPR42600.2020.01001 10.1109/TPAMI.2014.2339814 10.1109/ICCVW54120.2021.00297 10.1109/TPAMI.2020.2973983 10.1109/CVPR.2018.00380 10.1109/ICDAR.2013.221 10.18653/v1/2020.acl-main.642 10.1109/CVPR42600.2020.01276 10.1162/tacl_a_00051 10.1109/TIP.2020.3004729 10.1109/CVPR.2009.5206848 10.1109/TIP.2022.3181516 10.18653/v1/2020.coling-main.278 10.1109/ICCV.2019.01041 10.1109/CVPR46437.2021.00136 10.1145/3474085.3475606 10.1109/CVPR42600.2020.01211 10.1109/ICDAR.2015.7333942 10.1007/978-3-030-58536-5_44 10.1109/CVPR42600.2020.01459 10.1007/978-3-030-58545-7_41 10.1109/CVPR.2018.00592 10.18653/v1/2021.findings-acl.20 |
ContentType | Journal Article |
Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023 |
Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023 |
DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D 7X8 |
DOI | 10.1109/TIP.2023.3310332 |
DatabaseName | IEEE Xplore (IEEE) IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional MEDLINE - Academic |
DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional MEDLINE - Academic |
DatabaseTitleList | Technology Research Database MEDLINE - Academic |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Applied Sciences Engineering |
EISSN | 1941-0042 |
EndPage | 5074 |
ExternalDocumentID | 10_1109_TIP_2023_3310332 10241306 |
Genre | orig-research |
GrantInformation_xml | – fundername: National Natural Science Foundation of China grantid: 62020106007; 62272144; U20A20183; 72188101; 62272435; U22A2094; 62202139 funderid: 10.13039/501100001809 – fundername: Major Project of Anhui Province grantid: 202203a05020011 – fundername: National Key Research and Development Program of China grantid: 2022YFB4500600 funderid: 10.13039/501100012166 |
GroupedDBID | --- -~X .DC 0R~ 29I 4.4 53G 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABFSI ABQJQ ABVLG ACGFO ACGFS ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD F5P HZ~ H~9 ICLAB IFIPE IFJZH IPLJI JAVBF LAI M43 MS~ O9- OCL P2P RIA RIE RNS TAE TN5 VH1 AAYOK AAYXX CITATION RIG 7SC 7SP 8FD JQ2 L7M L~C L~D 7X8 |
ID | FETCH-LOGICAL-c325t-c657519992faaa47e8036f6a4d6e78369ead91c80189c92093d5aef767cdbd333 |
IEDL.DBID | RIE |
ISSN | 1057-7149 1941-0042 |
IngestDate | Thu Jul 10 19:58:41 EDT 2025 Mon Jun 30 10:13:00 EDT 2025 Thu Apr 24 23:01:41 EDT 2025 Tue Jul 01 02:18:57 EDT 2025 Wed Aug 27 02:51:02 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
Language | English |
License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c325t-c657519992faaa47e8036f6a4d6e78369ead91c80189c92093d5aef767cdbd333 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
ORCID | 0009-0007-4215-5464 0000-0002-3094-7735 0000-0003-2594-254X 0000-0001-9446-249X 0000-0003-0201-1638 |
PMID | 37669188 |
PQID | 2864341532 |
PQPubID | 85429 |
PageCount | 15 |
ParticipantIDs | proquest_miscellaneous_2861643292 ieee_primary_10241306 crossref_citationtrail_10_1109_TIP_2023_3310332 proquest_journals_2864341532 crossref_primary_10_1109_TIP_2023_3310332 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 20230000 2023-00-00 20230101 |
PublicationDateYYYYMMDD | 2023-01-01 |
PublicationDate_xml | – year: 2023 text: 20230000 |
PublicationDecade | 2020 |
PublicationPlace | New York |
PublicationPlace_xml | – name: New York |
PublicationTitle | IEEE transactions on image processing |
PublicationTitleAbbrev | TIP |
PublicationYear | 2023 |
Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
References | ref13 ref57 devlin (ref10) 2019 ref12 ref56 ref15 ref59 ref14 ref58 ref53 ref52 ref11 ref55 ref54 ref17 ref16 ref19 ref18 ref51 ref50 ref46 levenshtein (ref36) 1966; 10 ref45 ref42 ref41 ref44 ref43 jin (ref28) 2021; 25 krasin (ref34) 2017 ref8 ref7 ref9 ref3 ref6 ref5 veli?kovi? (ref49) 2018 biten (ref4) 2019 ref40 ref35 ref37 ref31 ref30 ref33 ref32 ref2 ref1 ref39 ref38 vaswani (ref47) 2017 veit (ref48) 2016 ref24 ref23 ref26 ref25 ref20 ref22 jiang (ref27) 2018 ref21 ref29 ref60 ref61 |
References_xml | – ident: ref12 doi: 10.1109/CVPR46437.2021.00702 – ident: ref43 doi: 10.1109/TPAMI.2016.2577031 – ident: ref16 doi: 10.1145/3343031.3350881 – ident: ref35 doi: 10.1007/s11263-016-0981-7 – ident: ref58 doi: 10.1145/3474085.3475425 – ident: ref13 doi: 10.1109/TPAMI.2021.3132034 – ident: ref6 doi: 10.1145/3219819.3219861 – ident: ref24 doi: 10.1109/CVPR.2019.00686 – ident: ref59 doi: 10.1609/aaai.v34i07.6999 – ident: ref25 doi: 10.1007/978-3-030-58558-7_40 – ident: ref50 doi: 10.1145/3394171.3413753 – ident: ref42 doi: 10.1109/ICCV.2013.378 – ident: ref61 doi: 10.24963/ijcai.2020/153 – ident: ref56 doi: 10.1007/978-3-030-01264-9_42 – ident: ref55 doi: 10.1109/CVPR46437.2021.00864 – ident: ref15 doi: 10.1109/TIP.2021.3051756 – ident: ref52 doi: 10.1109/CVPR42600.2020.01014 – start-page: 1 year: 2018 ident: ref49 article-title: Graph attention networks publication-title: Proc ICLR – start-page: 1563 year: 2019 ident: ref4 article-title: ICDAR 2019 competition on scene text visual question answering publication-title: Proc of Int Conf Document Analysis Recognition (ICDAR) – ident: ref40 doi: 10.24963/ijcai.2019/423 – ident: ref18 doi: 10.1609/aaai.v32i1.12235 – ident: ref11 doi: 10.1007/s11263-009-0275-4 – ident: ref39 doi: 10.1145/3394171.3413924 – ident: ref53 doi: 10.1007/s11432-021-3383-y – ident: ref3 doi: 10.1109/ICCV.2019.00439 – ident: ref17 doi: 10.24963/ijcai.2019/105 – year: 2016 ident: ref48 article-title: COCO-Text: Dataset and benchmark for text detection and recognition in natural images publication-title: arXiv 1601 07140 – ident: ref46 doi: 10.1109/CVPR.2019.00851 – volume: 10 start-page: 707 year: 1966 ident: ref36 article-title: Binary codes capable of correcting deletions, insertions, and reversals publication-title: Sov Phys Doklady – start-page: 5998 year: 2017 ident: ref47 article-title: Attention is all you need publication-title: Proc NeurIPS – ident: ref44 doi: 10.1109/CVPR.2019.00075 – ident: ref19 doi: 10.1109/TIP.2021.3097180 – ident: ref60 doi: 10.1609/aaai.v35i4.16476 – ident: ref2 doi: 10.1109/ICCV.2019.00481 – ident: ref22 doi: 10.1109/CVPR42600.2020.01001 – start-page: 4171 year: 2019 ident: ref10 article-title: Bert: Pre-training of deep bidirectional transformers for language understanding publication-title: Proc NAACL – ident: ref1 doi: 10.1109/TPAMI.2014.2339814 – ident: ref41 doi: 10.1109/ICCVW54120.2021.00297 – ident: ref54 doi: 10.1109/TPAMI.2020.2973983 – ident: ref20 doi: 10.1109/CVPR.2018.00380 – ident: ref32 doi: 10.1109/ICDAR.2013.221 – ident: ref23 doi: 10.18653/v1/2020.acl-main.642 – ident: ref14 doi: 10.1109/CVPR42600.2020.01276 – ident: ref5 doi: 10.1162/tacl_a_00051 – ident: ref26 doi: 10.1109/TIP.2020.3004729 – ident: ref9 doi: 10.1109/CVPR.2009.5206848 – ident: ref38 doi: 10.1109/TIP.2022.3181516 – ident: ref21 doi: 10.18653/v1/2020.coling-main.278 – ident: ref37 doi: 10.1109/ICCV.2019.01041 – ident: ref51 doi: 10.1109/CVPR46437.2021.00136 – volume: 25 start-page: 1 year: 2021 ident: ref28 article-title: RUArt: A novel text-centered solution for text-based visual question answering publication-title: IEEE TMM – ident: ref57 doi: 10.1145/3474085.3475606 – ident: ref8 doi: 10.1109/CVPR42600.2020.01211 – ident: ref31 doi: 10.1109/ICDAR.2015.7333942 – year: 2017 ident: ref34 publication-title: Openimages A public dataset for large-scale multi-label and multi-class image classification – ident: ref45 doi: 10.1007/978-3-030-58536-5_44 – year: 2018 ident: ref27 article-title: Pythia v0.1: The winning entry to the VQA challenge 2018 publication-title: arXiv 1807 09956 – ident: ref33 doi: 10.1109/CVPR42600.2020.01459 – ident: ref30 doi: 10.1007/978-3-030-58545-7_41 – ident: ref29 doi: 10.1109/CVPR.2018.00592 – ident: ref7 doi: 10.18653/v1/2021.findings-acl.20 |
SSID | ssj0014516 |
Score | 2.481644 |
Snippet | Text-based visual question answering (TextVQA) faces the significant challenge of avoiding redundant relational inference. To be specific, a large number of... |
SourceID | proquest crossref ieee |
SourceType | Aggregation Database Enrichment Source Index Database Publisher |
StartPage | 5060 |
SubjectTerms | Cognition Computational modeling graph inference Inference Learning Optical character recognition Predictions Pruning Question answering (information retrieval) Reasoning relation learning spatial relation Task analysis text-based visual question answering Transformers Visual observation Visual question answering Visualization |
Title | Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA |
URI | https://ieeexplore.ieee.org/document/10241306 https://www.proquest.com/docview/2864341532 https://www.proquest.com/docview/2861643292 |
Volume | 32 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8QwEB7Ukx58i6urRPDiobWbtGlzXMUnKIqreCtpkoK4dEV3L_56Z9J2FUXx0haalDSTyXyTeQHsG0p_KQsbWIGXuOAiyHipgtTFJjLGJlKTRffqWp7fx5ePyWMTrO5jYZxz3vnMhfTobfl2ZCZ0VIYc7q1AchZmUXOrg7WmJgOqOOtNm0kapIj7W5tkpA4HFzchlQkPBVXVElTBBvlKqp6vt_Ipjnx9lR-bspc0p0tw3Y6xdjB5DifjIjTv39I3_vsnlmGxwZysXy-SFZhx1SosNfiTNdz9tgoLX5ITrsHZ1D2P3b2g-uvohvvBkLUOdOypYmeU8JpdtGGDDDEwG5A2fYTi0bKH2_463J-eDI7Pg6bsQmAET8aB8bYYBI681FrHqctQypVSx1Y6ivlQuPhUz6Boy5RRPFLCJtqVqUyNLawQYgPmqlHlNoEVSiM8K3smSXRsSq6szIzUMnFFYaWOOnDYzn5umpzkVBpjmHvdJFI5ki4n0uUN6TpwMO3xUufj-KPtOk3_l3b1zHeg21I4bzj2LecZYjNEM9Rtb_oaeY0MKLpyo4lvg9ql4Ipv_fLpbZinEdRnNF2YG79O3A6ilnGx61frB0UX5J4 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT9wwEB6120PLAVpKxfJoXakXDglZO3bi44KAXR6rVl0qbpFjO1IFyiLYvfDrmXGSLWoF6iWJFDtyPB7PN54XwDdL6S9V6SIn8JKWXEQ5r3SU-dQm1jqpDFl0LyZqdJmeXsmrNlg9xMJ474PzmY_pMdjy3cwu6KgMOTxYgdRreCMpGrcJ11oaDajmbDBuyizKEPl3VslE70_H32MqFB4LqqslqIYNcpbSg1Bx5Y9AChVW_tmWg6w5XoNJN8rGxeQ6XszL2D78lcDxv3_jPay2qJMNm2XyAV75eh3WWgTKWv6-X4eVJ-kJP8LJ0kGP_bxFBdjTDXeEG9a50LHfNTuhlNds3AUOMkTBbEr69AEKSMd-_RhuwOXx0fRwFLWFFyIruJxHNlhjEDryyhiTZj5HOVcpkzrlKepD4_LTA4vCLddW80QLJ42vMpVZVzohxCfo1bPabwIrtUGAVg2slCa1FddO5VYZJX1ZOmWSPux3s1_YNis5Fce4KYJ2kugCSVcQ6YqWdH3YW_a4bTJyvNB2g6b_Sbtm5vuw01G4aHn2vuA5ojPEM9Tt6_I1chuZUEztZ4vQBvVLwTXfeubTX-DtaHpxXpyPJ2fb8I5G05zY7EBvfrfwu4hh5uXnsHIfAVui5-Y |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Exploring+Sparse+Spatial+Relation+in+Graph+Inference+for+Text-Based+VQA&rft.jtitle=IEEE+transactions+on+image+processing&rft.au=Zhou%2C+Sheng&rft.au=Guo%2C+Dan&rft.au=Li%2C+Jia&rft.au=Yang%2C+Xun&rft.date=2023&rft.issn=1057-7149&rft.eissn=1941-0042&rft.volume=32&rft.spage=5060&rft.epage=5074&rft_id=info:doi/10.1109%2FTIP.2023.3310332&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TIP_2023_3310332 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1057-7149&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1057-7149&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1057-7149&client=summon |