Illation of Video Visual Relation Detection Based on Graph Neural Network
Visual relation detection task is the bridge between semantic text and image information, and it can better express the content of images or video through the relation triple <subject, predicate, object>. This significant research can be applied to image question answering, video subtitles and...
Saved in:
Published in | IEEE access Vol. 9; pp. 141144 - 141153 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Piscataway
IEEE
2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
ISSN | 2169-3536 2169-3536 |
DOI | 10.1109/ACCESS.2021.3115260 |
Cover
Abstract | Visual relation detection task is the bridge between semantic text and image information, and it can better express the content of images or video through the relation triple <subject, predicate, object>. This significant research can be applied to image question answering, video subtitles and other directions Using the video as input to the task of visual relationship detection receives less attention. Therefore, we propose an algorithm based on the graph convolution neural network and multi-hypothesis tree to implement video relationship prediction. Video visual relationship detection algorithm is divided into three steps: Firstly, the motion trajectories of the subject and object in the input video clip are generated; Secondly, a VRGE network module based on the graph convolution neural network is proposed to predict the relationship between objects in the video clip; Finally, the relationship triplets are formed through the multi-hypothesis fusion algorithm (MHF) and the visual relationship. We have verified our method on the benchmark ImageNet-VidVRD dataset. The experimental results demonstrate that our proposed method can achieve a satisfactory accuracy of 29.05% and recall of 10.18% for visual relation detection. |
---|---|
AbstractList | Visual relation detection task is the bridge between semantic text and image information, and it can better express the content of images or video through the relation triple <subject, predicate, object>. This significant research can be applied to image question answering, video subtitles and other directions Using the video as input to the task of visual relationship detection receives less attention. Therefore, we propose an algorithm based on the graph convolution neural network and multi-hypothesis tree to implement video relationship prediction. Video visual relationship detection algorithm is divided into three steps: Firstly, the motion trajectories of the subject and object in the input video clip are generated; Secondly, a VRGE network module based on the graph convolution neural network is proposed to predict the relationship between objects in the video clip; Finally, the relationship triplets are formed through the multi-hypothesis fusion algorithm (MHF) and the visual relationship. We have verified our method on the benchmark ImageNet-VidVRD dataset. The experimental results demonstrate that our proposed method can achieve a satisfactory accuracy of 29.05% and recall of 10.18% for visual relation detection. Visual relation detection task is the bridge between semantic text and image information, and it can better express the content of images or video through the relation triple . This significant research can be applied to image question answering, video subtitles and other directions Using the video as input to the task of visual relationship detection receives less attention. Therefore, we propose an algorithm based on the graph convolution neural network and multi-hypothesis tree to implement video relationship prediction. Video visual relationship detection algorithm is divided into three steps: Firstly, the motion trajectories of the subject and object in the input video clip are generated; Secondly, a VRGE network module based on the graph convolution neural network is proposed to predict the relationship between objects in the video clip; Finally, the relationship triplets are formed through the multi-hypothesis fusion algorithm (MHF) and the visual relationship. We have verified our method on the benchmark ImageNet-VidVRD dataset. The experimental results demonstrate that our proposed method can achieve a satisfactory accuracy of 29.05% and recall of 10.18% for visual relation detection. |
Author | Su, Tonghua Cui, Jianxun Qu, Mingcheng Nie, Yuxi |
Author_xml | – sequence: 1 givenname: Mingcheng surname: Qu fullname: Qu, Mingcheng organization: Department of Software, Harbin Institute of Technology, Harbin, China – sequence: 2 givenname: Jianxun surname: Cui fullname: Cui, Jianxun organization: Department of Software, Harbin Institute of Technology, Harbin, China – sequence: 3 givenname: Yuxi orcidid: 0000-0001-6468-6898 surname: Nie fullname: Nie, Yuxi email: yuxi.nie@foxmail.com organization: Department of Software, Harbin Institute of Technology, Harbin, China – sequence: 4 givenname: Tonghua surname: Su fullname: Su, Tonghua organization: Department of Software, Harbin Institute of Technology, Harbin, China |
BookMark | eNqFUU1PAjEUbAwmIvILuGziGezH9uuIqEhCMBH12nS7D11cKXZ3Y_z3FpYQ48UeXievM_NeOueos_EbQGhA8IgQrK_Gk8ntcjmimJIRI4RTgU9QlxKhh4wz0fmFz1C_qtY4HhVbXHbRbFaWti78JvGr5KXIwcdaNbZMHuHwcAM1uD26thXkSQTTYLdvyQKaEIkLqL98eL9ApytbVtA_3D30fHf7NLkfzh-ms8l4PnQpVvUQMqswF1Ryl6YaO6VF5qRWoBiTudNUQqa5cynDihMMOreWMUWylXSCSMx6aNb65t6uzTYUHzZ8G28Ls2_48GpsqAtXgslzrngGRORKpwBOMZEpkFZm2lEhVPS6bL22wX82UNVm7ZuwiesbypXgmijBIou1LBd8VQVYHacSbHYRmDYCs4vAHCKIKv1H5Yp6_6N1sEX5j3bQagsAOE7TPJVUSPYD1hSUQQ |
CODEN | IAECCG |
CitedBy_id | crossref_primary_10_1016_j_neucom_2023_126274 |
Cites_doi | 10.1109/CVPR.2017.733 10.1007/978-3-319-46448-0_51 10.1109/CVPR.2018.00611 10.1007/s11263-015-0816-y 10.1007/978-3-319-10602-1_48 10.1109/ICIP.2018.8451102 10.1109/CVPR.2016.95 10.1016/j.cosrev.2014.04.001 10.1109/TNN.2008.2005605 10.1109/CVPR42600.2020.01065 10.1109/CVPR.2009.5206848 10.1145/3343031.3351058 10.1109/CVPR.2015.7298641 10.1109/CVPR.2017.690 10.1109/CVPR.2017.331 10.1145/3123266.3123380 10.1109/CVPR.2019.00142 |
ContentType | Journal Article |
Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021 |
Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021 |
DBID | 97E ESBDL RIA RIE AAYXX CITATION 7SC 7SP 7SR 8BQ 8FD JG9 JQ2 L7M L~C L~D DOA |
DOI | 10.1109/ACCESS.2021.3115260 |
DatabaseName | IEEE Xplore (IEEE) IEEE Xplore Open Access Journals IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Engineered Materials Abstracts METADEX Technology Research Database Materials Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional DOAJ Directory of Open Access Journals |
DatabaseTitle | CrossRef Materials Research Database Engineered Materials Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace METADEX Computer and Information Systems Abstracts Professional |
DatabaseTitleList | Materials Research Database |
Database_xml | – sequence: 1 dbid: DOA name: Directory of Open Access Journals (DOAJ) url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering |
EISSN | 2169-3536 |
EndPage | 141153 |
ExternalDocumentID | oai_doaj_org_article_dd585be16d894eec836b8e7a7b9c2668 10_1109_ACCESS_2021_3115260 9547267 |
Genre | orig-research |
GrantInformation_xml | – fundername: National Natural Science Foundation of China grantid: 61402131 funderid: 10.13039/501100001809 |
GroupedDBID | 0R~ 4.4 5VS 6IK 97E AAJGR ABAZT ABVLG ACGFS ADBBV AGSQL ALMA_UNASSIGNED_HOLDINGS BCNDV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD ESBDL GROUPED_DOAJ IPLJI JAVBF KQ8 M43 M~E O9- OCL OK1 RIA RIE RNS AAYXX CITATION RIG 7SC 7SP 7SR 8BQ 8FD JG9 JQ2 L7M L~C L~D |
ID | FETCH-LOGICAL-c408t-eba8056275c4490c896bc798e8337dc927eb95cc4308510e9daa3381bf7c61703 |
IEDL.DBID | DOA |
ISSN | 2169-3536 |
IngestDate | Wed Aug 27 01:22:12 EDT 2025 Mon Jun 30 03:59:56 EDT 2025 Tue Jul 01 04:20:38 EDT 2025 Thu Apr 24 23:11:24 EDT 2025 Wed Aug 27 02:28:58 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Language | English |
License | https://creativecommons.org/licenses/by/4.0/legalcode |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c408t-eba8056275c4490c896bc798e8337dc927eb95cc4308510e9daa3381bf7c61703 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ORCID | 0000-0001-6468-6898 |
OpenAccessLink | https://doaj.org/article/dd585be16d894eec836b8e7a7b9c2668 |
PQID | 2586591863 |
PQPubID | 4845423 |
PageCount | 10 |
ParticipantIDs | doaj_primary_oai_doaj_org_article_dd585be16d894eec836b8e7a7b9c2668 crossref_citationtrail_10_1109_ACCESS_2021_3115260 proquest_journals_2586591863 ieee_primary_9547267 crossref_primary_10_1109_ACCESS_2021_3115260 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 20210000 2021-00-00 20210101 2021-01-01 |
PublicationDateYYYYMMDD | 2021-01-01 |
PublicationDate_xml | – year: 2021 text: 20210000 |
PublicationDecade | 2020 |
PublicationPlace | Piscataway |
PublicationPlace_xml | – name: Piscataway |
PublicationTitle | IEEE access |
PublicationTitleAbbrev | Access |
PublicationYear | 2021 |
Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
References | ref13 ref12 ref15 velickovic (ref2) 2018 ref14 lu (ref3) 2016; 9905 ren (ref5) 2015 lin (ref20) 2014; 8693 ref11 ref22 ref10 ref21 ref1 ref17 ref19 ref18 ref8 nam (ref9) 2016; abs 1608 7242 yang (ref16) 2018; 11205 ref4 ref6 redmon (ref7) 2018; abs 1804 2767 |
References_xml | – ident: ref8 doi: 10.1109/CVPR.2017.733 – volume: 9905 start-page: 852 year: 2016 ident: ref3 article-title: Visual relationship detection with language priors publication-title: Computer Vision-ECCV 2016 doi: 10.1007/978-3-319-46448-0_51 – ident: ref13 doi: 10.1109/CVPR.2018.00611 – start-page: 91 year: 2015 ident: ref5 article-title: Faster R-CNN: Towards real-time object detection with region proposal networks publication-title: Proc Annu Conf Neural Inf Process Syst – ident: ref18 doi: 10.1007/s11263-015-0816-y – volume: 8693 start-page: 740 year: 2014 ident: ref20 article-title: Microsoft COCO: Common objects in context publication-title: Computer Vision-ECCV 2014 doi: 10.1007/978-3-319-10602-1_48 – ident: ref10 doi: 10.1109/ICIP.2018.8451102 – volume: abs 1804 2767 start-page: 1 year: 2018 ident: ref7 article-title: YOLOv3: An incremental improvement publication-title: CoRR – ident: ref19 doi: 10.1109/CVPR.2016.95 – ident: ref1 doi: 10.1016/j.cosrev.2014.04.001 – ident: ref14 doi: 10.1109/TNN.2008.2005605 – ident: ref15 doi: 10.1109/CVPR42600.2020.01065 – ident: ref17 doi: 10.1109/CVPR.2009.5206848 – volume: abs 1608 7242 start-page: 1 year: 2016 ident: ref9 article-title: Modeling and propagating CNNs in a tree structure for visual tracking publication-title: CoRR – ident: ref22 doi: 10.1145/3343031.3351058 – ident: ref4 doi: 10.1109/CVPR.2015.7298641 – ident: ref6 doi: 10.1109/CVPR.2017.690 – start-page: 1 year: 2018 ident: ref2 article-title: Graph attention networks publication-title: Proc 6th Int Conf Learn Represent (ICLR) – volume: 11205 start-page: 690 year: 2018 ident: ref16 article-title: Graph R-CNN for scene graph generation publication-title: Proc 15th Eur Conf Comput Vis – ident: ref12 doi: 10.1109/CVPR.2017.331 – ident: ref21 doi: 10.1145/3123266.3123380 – ident: ref11 doi: 10.1109/CVPR.2019.00142 |
SSID | ssj0000816957 |
Score | 2.1985621 |
Snippet | Visual relation detection task is the bridge between semantic text and image information, and it can better express the content of images or video through the... |
SourceID | doaj proquest crossref ieee |
SourceType | Open Website Aggregation Database Enrichment Source Index Database Publisher |
StartPage | 141144 |
SubjectTerms | Algorithms Artificial neural networks Deep learning graph convolutional neural network Graph neural networks Hypotheses Neural networks Prediction algorithms Predictive models target detection Target tracking Task analysis Trajectory Video visual relation detection Visual tasks Visualization |
SummonAdditionalLinks | – databaseName: IEEE Electronic Library (IEL) dbid: RIE link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB61PcGBUgpioaAcODbbxHH8OLZb-kBqTxT1ZsXjiYSodhHNXvj1HTveUAFCXCIrGkeTfGN7xhl_A_CB50TLUUNfolBdKRF5HuybrlSNFsbLSA4ZzztfXauLG_nptr3dgsPpLAwRpeQzmsdm-pcfVriOW2VHtpVaKL0N22xm41mtaT8lFpCwrc7EQnVlj44XC34HDgFFPY-cMiLRUP5afBJHfy6q8sdMnJaXs1242ig2ZpV8m68HP8efv3E2_q_mz-FZ9jOL49Ew9mCLli_g6SP2wX24vLwbE-GKVV98-Rpoxdf7NffaZMgVpzSkVK1lccKrXSi4cR4ZrovI6cGC12MS-Uu4Ofv4eXFR5soKJcrKDCX5zkTPR7copa3QWOVRW0OmaXRAKzR52yLKJnpkFdnQdRzL1r7XGBncm1ews1wt6TUUUmHPwtix7yG7queAV8jQa0P8DKHsDMTmkzvMtOOx-sWdS-FHZd2Ik4s4uYzTDA6nTt9H1o1_i59ELCfRSJmdbjAGLo9AFwJHRp5qFYyVRGga5VnJTrPy7KWYGexH3KaHZMhmcLCxDJeH970TrVGtrY1q3vy911t4EhUc92oOYGf4saZ37L0M_n0y2wcjGenh priority: 102 providerName: IEEE |
Title | Illation of Video Visual Relation Detection Based on Graph Neural Network |
URI | https://ieeexplore.ieee.org/document/9547267 https://www.proquest.com/docview/2586591863 https://doaj.org/article/dd585be16d894eec836b8e7a7b9c2668 |
Volume | 9 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3PS8MwFA6ykx5EneJ0jhw8WtemaX4ct-ncBHdyslto0hSE0Ynr_n9fkm4OBL14KaW8pMnL68v7ysv3ELoFnygBNZSRISyPqDHgB8s0j1jKidDUkUO6884vMzaZ0-dFttgr9eVywgI9cFBcvyggoNU2YYWQ1FojUqaF5TnXErpn_phvLOM9MOV9sEiYzHhDM5TEsj8YjWBGAAhJcu8YZognpfzeijxjf1Ni5Ydf9pvN-AQdN1EiHoTRnaIDW52hoz3uwDaaTpchjQ2vSvz2XtgVXNcbaLXNb8MPtvaJVhUewl5VYLh5cvzU2DFygOAspICfo_n48XU0iZq6CJGhsagjq3Ph4haeGUplbIRk2nAprEhTXhhJuNUyM4amLp6KrSzyHJBooktuHP96eoFa1aqylwhTZkoQNjlEDjSPS4CrhBYlBw0D5Gayg8hWRco0pOGudsVSefAQSxX0qpxeVaPXDrrbNfoInBm_iw-d7neijvDaPwAzUI0ZqL_MoIPabuV2nciMcsJ4B3W3K6maj3OtSCZYJhPB0qv_ePU1OnTTCf9luqhVf27sDUQqte55o-z5Q4VfOEjiAQ |
linkProvider | Directory of Open Access Journals |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Nb9QwEB2VcgAOpaUgti0lhx6bbeI4_ji2C2UXuntqUW9WbE8kRLWLaPbCr-_Y8QYECHGJrGgcTfJsz4wzfgNwQmuipqihzR0TTc6do3WwrZpcVJIpywM5ZDjvPF-I6Q3_eFvfbsHpcBYGEWPyGY5DM_7L9yu3DltlZ7rmkgn5CB6T3ed1f1pr2FEJJSR0LRO1UFnos_PJhN6CgkBWjgOrDItElD_NT2TpT2VV_liLo4G5fA7zjWp9XsnX8bqzY_fjN9bG_9V9F3aSp5md90NjD7Zw-QKe_cI_uA-z2V2fCpet2uzzF48rut6vqdcmRy57h11M1lpmF2TvfEaND4HjOgusHiS46NPIX8LN5fvryTRPtRVyxwvV5WgbFXwfWTvOdeGUFtZJrVBVlfROM4lW187xKvhkBWrfNBTNlraVLnC4V69ge7la4mvIuHAtCbuGvA_eFC2FvIz7ViqkZzChR8A2n9y4RDwe6l_cmRiAFNr0OJmAk0k4jeB06PSt5934t_hFwHIQDaTZ8QZhYNIcNN5TbGSxFF5pjuhUJSwp2UhSnvwUNYL9gNvwkATZCI42I8OkCX5vWK1ErUslqoO_93oLT6bX8ytzNVt8OoSnQdl-5-YItrvva3xDvkxnj-MQfgCIFe0u |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Illation+of+Video+Visual+Relation+Detection+Based+on+Graph+Neural+Network&rft.jtitle=IEEE+access&rft.au=Qu%2C+Mingcheng&rft.au=Cui%2C+Jianxun&rft.au=Nie%2C+Yuxi&rft.au=Su%2C+Tonghua&rft.date=2021&rft.pub=IEEE&rft.eissn=2169-3536&rft.volume=9&rft.spage=141144&rft.epage=141153&rft_id=info:doi/10.1109%2FACCESS.2021.3115260&rft.externalDocID=9547267 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2169-3536&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2169-3536&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2169-3536&client=summon |