Illation of Video Visual Relation Detection Based on Graph Neural Network

Visual relation detection task is the bridge between semantic text and image information, and it can better express the content of images or video through the relation triple <subject, predicate, object>. This significant research can be applied to image question answering, video subtitles and...

Full description

Saved in:

Bibliographic Details
Published in	IEEE access Vol. 9; pp. 141144 - 141153
Main Authors	Qu, Mingcheng, Cui, Jianxun, Nie, Yuxi, Su, Tonghua
Format	Journal Article
Language	English
Published	Piscataway IEEE 2021 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Artificial neural networks Deep learning graph convolutional neural network Graph neural networks Hypotheses Neural networks Prediction algorithms Predictive models target detection Target tracking Task analysis Trajectory Video visual relation detection Visual tasks Visualization
Online Access	Get full text
ISSN	2169-3536 2169-3536
DOI	10.1109/ACCESS.2021.3115260

Cover

Abstract	Visual relation detection task is the bridge between semantic text and image information, and it can better express the content of images or video through the relation triple <subject, predicate, object>. This significant research can be applied to image question answering, video subtitles and other directions Using the video as input to the task of visual relationship detection receives less attention. Therefore, we propose an algorithm based on the graph convolution neural network and multi-hypothesis tree to implement video relationship prediction. Video visual relationship detection algorithm is divided into three steps: Firstly, the motion trajectories of the subject and object in the input video clip are generated; Secondly, a VRGE network module based on the graph convolution neural network is proposed to predict the relationship between objects in the video clip; Finally, the relationship triplets are formed through the multi-hypothesis fusion algorithm (MHF) and the visual relationship. We have verified our method on the benchmark ImageNet-VidVRD dataset. The experimental results demonstrate that our proposed method can achieve a satisfactory accuracy of 29.05% and recall of 10.18% for visual relation detection.
AbstractList	Visual relation detection task is the bridge between semantic text and image information, and it can better express the content of images or video through the relation triple <subject, predicate, object>. This significant research can be applied to image question answering, video subtitles and other directions Using the video as input to the task of visual relationship detection receives less attention. Therefore, we propose an algorithm based on the graph convolution neural network and multi-hypothesis tree to implement video relationship prediction. Video visual relationship detection algorithm is divided into three steps: Firstly, the motion trajectories of the subject and object in the input video clip are generated; Secondly, a VRGE network module based on the graph convolution neural network is proposed to predict the relationship between objects in the video clip; Finally, the relationship triplets are formed through the multi-hypothesis fusion algorithm (MHF) and the visual relationship. We have verified our method on the benchmark ImageNet-VidVRD dataset. The experimental results demonstrate that our proposed method can achieve a satisfactory accuracy of 29.05% and recall of 10.18% for visual relation detection. Visual relation detection task is the bridge between semantic text and image information, and it can better express the content of images or video through the relation triple . This significant research can be applied to image question answering, video subtitles and other directions Using the video as input to the task of visual relationship detection receives less attention. Therefore, we propose an algorithm based on the graph convolution neural network and multi-hypothesis tree to implement video relationship prediction. Video visual relationship detection algorithm is divided into three steps: Firstly, the motion trajectories of the subject and object in the input video clip are generated; Secondly, a VRGE network module based on the graph convolution neural network is proposed to predict the relationship between objects in the video clip; Finally, the relationship triplets are formed through the multi-hypothesis fusion algorithm (MHF) and the visual relationship. We have verified our method on the benchmark ImageNet-VidVRD dataset. The experimental results demonstrate that our proposed method can achieve a satisfactory accuracy of 29.05% and recall of 10.18% for visual relation detection.
Author	Su, Tonghua Cui, Jianxun Qu, Mingcheng Nie, Yuxi
Author_xml	– sequence: 1 givenname: Mingcheng surname: Qu fullname: Qu, Mingcheng organization: Department of Software, Harbin Institute of Technology, Harbin, China – sequence: 2 givenname: Jianxun surname: Cui fullname: Cui, Jianxun organization: Department of Software, Harbin Institute of Technology, Harbin, China – sequence: 3 givenname: Yuxi orcidid: 0000-0001-6468-6898 surname: Nie fullname: Nie, Yuxi email: yuxi.nie@foxmail.com organization: Department of Software, Harbin Institute of Technology, Harbin, China – sequence: 4 givenname: Tonghua surname: Su fullname: Su, Tonghua organization: Department of Software, Harbin Institute of Technology, Harbin, China
BookMark	eNqFUU1PAjEUbAwmIvILuGziGezH9uuIqEhCMBH12nS7D11cKXZ3Y_z3FpYQ48UeXievM_NeOueos_EbQGhA8IgQrK_Gk8ntcjmimJIRI4RTgU9QlxKhh4wz0fmFz1C_qtY4HhVbXHbRbFaWti78JvGr5KXIwcdaNbZMHuHwcAM1uD26thXkSQTTYLdvyQKaEIkLqL98eL9ApytbVtA_3D30fHf7NLkfzh-ms8l4PnQpVvUQMqswF1Ryl6YaO6VF5qRWoBiTudNUQqa5cynDihMMOreWMUWylXSCSMx6aNb65t6uzTYUHzZ8G28Ls2_48GpsqAtXgslzrngGRORKpwBOMZEpkFZm2lEhVPS6bL22wX82UNVm7ZuwiesbypXgmijBIou1LBd8VQVYHacSbHYRmDYCs4vAHCKIKv1H5Yp6_6N1sEX5j3bQagsAOE7TPJVUSPYD1hSUQQ
CODEN	IAECCG
CitedBy_id	crossref_primary_10_1016_j_neucom_2023_126274
Cites_doi	10.1109/CVPR.2017.733 10.1007/978-3-319-46448-0_51 10.1109/CVPR.2018.00611 10.1007/s11263-015-0816-y 10.1007/978-3-319-10602-1_48 10.1109/ICIP.2018.8451102 10.1109/CVPR.2016.95 10.1016/j.cosrev.2014.04.001 10.1109/TNN.2008.2005605 10.1109/CVPR42600.2020.01065 10.1109/CVPR.2009.5206848 10.1145/3343031.3351058 10.1109/CVPR.2015.7298641 10.1109/CVPR.2017.690 10.1109/CVPR.2017.331 10.1145/3123266.3123380 10.1109/CVPR.2019.00142
ContentType	Journal Article
Copyright	Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021
Copyright_xml	– notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021
DBID	97E ESBDL RIA RIE AAYXX CITATION 7SC 7SP 7SR 8BQ 8FD JG9 JQ2 L7M L~C L~D DOA
DOI	10.1109/ACCESS.2021.3115260
DatabaseName	IEEE Xplore (IEEE) IEEE Xplore Open Access Journals IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Engineered Materials Abstracts METADEX Technology Research Database Materials Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional DOAJ Directory of Open Access Journals
DatabaseTitle	CrossRef Materials Research Database Engineered Materials Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace METADEX Computer and Information Systems Abstracts Professional
DatabaseTitleList	Materials Research Database
Database_xml	– sequence: 1 dbid: DOA name: Directory of Open Access Journals (DOAJ) url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISSN	2169-3536
EndPage	141153
ExternalDocumentID	oai_doaj_org_article_dd585be16d894eec836b8e7a7b9c2668 10_1109_ACCESS_2021_3115260 9547267
Genre	orig-research
GrantInformation_xml	– fundername: National Natural Science Foundation of China grantid: 61402131 funderid: 10.13039/501100001809
GroupedDBID	0R~ 4.4 5VS 6IK 97E AAJGR ABAZT ABVLG ACGFS ADBBV AGSQL ALMA_UNASSIGNED_HOLDINGS BCNDV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD ESBDL GROUPED_DOAJ IPLJI JAVBF KQ8 M43 M~E O9- OCL OK1 RIA RIE RNS AAYXX CITATION RIG 7SC 7SP 7SR 8BQ 8FD JG9 JQ2 L7M L~C L~D
ID	FETCH-LOGICAL-c408t-eba8056275c4490c896bc798e8337dc927eb95cc4308510e9daa3381bf7c61703
IEDL.DBID	DOA
ISSN	2169-3536
IngestDate	Wed Aug 27 01:22:12 EDT 2025 Mon Jun 30 03:59:56 EDT 2025 Tue Jul 01 04:20:38 EDT 2025 Thu Apr 24 23:11:24 EDT 2025 Wed Aug 27 02:28:58 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Language	English
License	https://creativecommons.org/licenses/by/4.0/legalcode
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c408t-eba8056275c4490c896bc798e8337dc927eb95cc4308510e9daa3381bf7c61703
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ORCID	0000-0001-6468-6898
OpenAccessLink	https://doaj.org/article/dd585be16d894eec836b8e7a7b9c2668
PQID	2586591863
PQPubID	4845423
PageCount	10
ParticipantIDs	doaj_primary_oai_doaj_org_article_dd585be16d894eec836b8e7a7b9c2668 crossref_citationtrail_10_1109_ACCESS_2021_3115260 proquest_journals_2586591863 ieee_primary_9547267 crossref_primary_10_1109_ACCESS_2021_3115260
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	20210000 2021-00-00 20210101 2021-01-01
PublicationDateYYYYMMDD	2021-01-01
PublicationDate_xml	– year: 2021 text: 20210000
PublicationDecade	2020
PublicationPlace	Piscataway
PublicationPlace_xml	– name: Piscataway
PublicationTitle	IEEE access
PublicationTitleAbbrev	Access
PublicationYear	2021
Publisher	IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml	– name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References	ref13 ref12 ref15 velickovic (ref2) 2018 ref14 lu (ref3) 2016; 9905 ren (ref5) 2015 lin (ref20) 2014; 8693 ref11 ref22 ref10 ref21 ref1 ref17 ref19 ref18 ref8 nam (ref9) 2016; abs 1608 7242 yang (ref16) 2018; 11205 ref4 ref6 redmon (ref7) 2018; abs 1804 2767
References_xml	– ident: ref8 doi: 10.1109/CVPR.2017.733 – volume: 9905 start-page: 852 year: 2016 ident: ref3 article-title: Visual relationship detection with language priors publication-title: Computer Vision-ECCV 2016 doi: 10.1007/978-3-319-46448-0_51 – ident: ref13 doi: 10.1109/CVPR.2018.00611 – start-page: 91 year: 2015 ident: ref5 article-title: Faster R-CNN: Towards real-time object detection with region proposal networks publication-title: Proc Annu Conf Neural Inf Process Syst – ident: ref18 doi: 10.1007/s11263-015-0816-y – volume: 8693 start-page: 740 year: 2014 ident: ref20 article-title: Microsoft COCO: Common objects in context publication-title: Computer Vision-ECCV 2014 doi: 10.1007/978-3-319-10602-1_48 – ident: ref10 doi: 10.1109/ICIP.2018.8451102 – volume: abs 1804 2767 start-page: 1 year: 2018 ident: ref7 article-title: YOLOv3: An incremental improvement publication-title: CoRR – ident: ref19 doi: 10.1109/CVPR.2016.95 – ident: ref1 doi: 10.1016/j.cosrev.2014.04.001 – ident: ref14 doi: 10.1109/TNN.2008.2005605 – ident: ref15 doi: 10.1109/CVPR42600.2020.01065 – ident: ref17 doi: 10.1109/CVPR.2009.5206848 – volume: abs 1608 7242 start-page: 1 year: 2016 ident: ref9 article-title: Modeling and propagating CNNs in a tree structure for visual tracking publication-title: CoRR – ident: ref22 doi: 10.1145/3343031.3351058 – ident: ref4 doi: 10.1109/CVPR.2015.7298641 – ident: ref6 doi: 10.1109/CVPR.2017.690 – start-page: 1 year: 2018 ident: ref2 article-title: Graph attention networks publication-title: Proc 6th Int Conf Learn Represent (ICLR) – volume: 11205 start-page: 690 year: 2018 ident: ref16 article-title: Graph R-CNN for scene graph generation publication-title: Proc 15th Eur Conf Comput Vis – ident: ref12 doi: 10.1109/CVPR.2017.331 – ident: ref21 doi: 10.1145/3123266.3123380 – ident: ref11 doi: 10.1109/CVPR.2019.00142
SSID	ssj0000816957
Score	2.1985621
Snippet	Visual relation detection task is the bridge between semantic text and image information, and it can better express the content of images or video through the...
SourceID	doaj proquest crossref ieee
SourceType	Open Website Aggregation Database Enrichment Source Index Database Publisher
StartPage	141144
SubjectTerms	Algorithms Artificial neural networks Deep learning graph convolutional neural network Graph neural networks Hypotheses Neural networks Prediction algorithms Predictive models target detection Target tracking Task analysis Trajectory Video visual relation detection Visual tasks Visualization
SummonAdditionalLinks	– databaseName: IEEE Electronic Library (IEL) dbid: RIE link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB61PcGBUgpioaAcODbbxHH8OLZb-kBqTxT1ZsXjiYSodhHNXvj1HTveUAFCXCIrGkeTfGN7xhl_A_CB50TLUUNfolBdKRF5HuybrlSNFsbLSA4ZzztfXauLG_nptr3dgsPpLAwRpeQzmsdm-pcfVriOW2VHtpVaKL0N22xm41mtaT8lFpCwrc7EQnVlj44XC34HDgFFPY-cMiLRUP5afBJHfy6q8sdMnJaXs1242ig2ZpV8m68HP8efv3E2_q_mz-FZ9jOL49Ew9mCLli_g6SP2wX24vLwbE-GKVV98-Rpoxdf7NffaZMgVpzSkVK1lccKrXSi4cR4ZrovI6cGC12MS-Uu4Ofv4eXFR5soKJcrKDCX5zkTPR7copa3QWOVRW0OmaXRAKzR52yLKJnpkFdnQdRzL1r7XGBncm1ews1wt6TUUUmHPwtix7yG7queAV8jQa0P8DKHsDMTmkzvMtOOx-sWdS-FHZd2Ik4s4uYzTDA6nTt9H1o1_i59ELCfRSJmdbjAGLo9AFwJHRp5qFYyVRGga5VnJTrPy7KWYGexH3KaHZMhmcLCxDJeH970TrVGtrY1q3vy911t4EhUc92oOYGf4saZ37L0M_n0y2wcjGenh priority: 102 providerName: IEEE
Title	Illation of Video Visual Relation Detection Based on Graph Neural Network
URI	https://ieeexplore.ieee.org/document/9547267 https://www.proquest.com/docview/2586591863 https://doaj.org/article/dd585be16d894eec836b8e7a7b9c2668
Volume	9
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3PS8MwFA6ykx5EneJ0jhw8WtemaX4ct-ncBHdyslto0hSE0Ynr_n9fkm4OBL14KaW8pMnL68v7ysv3ELoFnygBNZSRISyPqDHgB8s0j1jKidDUkUO6884vMzaZ0-dFttgr9eVywgI9cFBcvyggoNU2YYWQ1FojUqaF5TnXErpn_phvLOM9MOV9sEiYzHhDM5TEsj8YjWBGAAhJcu8YZognpfzeijxjf1Ni5Ydf9pvN-AQdN1EiHoTRnaIDW52hoz3uwDaaTpchjQ2vSvz2XtgVXNcbaLXNb8MPtvaJVhUewl5VYLh5cvzU2DFygOAspICfo_n48XU0iZq6CJGhsagjq3Ph4haeGUplbIRk2nAprEhTXhhJuNUyM4amLp6KrSzyHJBooktuHP96eoFa1aqylwhTZkoQNjlEDjSPS4CrhBYlBw0D5Gayg8hWRco0pOGudsVSefAQSxX0qpxeVaPXDrrbNfoInBm_iw-d7neijvDaPwAzUI0ZqL_MoIPabuV2nciMcsJ4B3W3K6maj3OtSCZYJhPB0qv_ePU1OnTTCf9luqhVf27sDUQqte55o-z5Q4VfOEjiAQ
linkProvider	Directory of Open Access Journals
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Nb9QwEB2VcgAOpaUgti0lhx6bbeI4_ji2C2UXuntqUW9WbE8kRLWLaPbCr-_Y8QYECHGJrGgcTfJsz4wzfgNwQmuipqihzR0TTc6do3WwrZpcVJIpywM5ZDjvPF-I6Q3_eFvfbsHpcBYGEWPyGY5DM_7L9yu3DltlZ7rmkgn5CB6T3ed1f1pr2FEJJSR0LRO1UFnos_PJhN6CgkBWjgOrDItElD_NT2TpT2VV_liLo4G5fA7zjWp9XsnX8bqzY_fjN9bG_9V9F3aSp5md90NjD7Zw-QKe_cI_uA-z2V2fCpet2uzzF48rut6vqdcmRy57h11M1lpmF2TvfEaND4HjOgusHiS46NPIX8LN5fvryTRPtRVyxwvV5WgbFXwfWTvOdeGUFtZJrVBVlfROM4lW187xKvhkBWrfNBTNlraVLnC4V69ge7la4mvIuHAtCbuGvA_eFC2FvIz7ViqkZzChR8A2n9y4RDwe6l_cmRiAFNr0OJmAk0k4jeB06PSt5934t_hFwHIQDaTZ8QZhYNIcNN5TbGSxFF5pjuhUJSwp2UhSnvwUNYL9gNvwkATZCI42I8OkCX5vWK1ErUslqoO_93oLT6bX8ytzNVt8OoSnQdl-5-YItrvva3xDvkxnj-MQfgCIFe0u
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Illation+of+Video+Visual+Relation+Detection+Based+on+Graph+Neural+Network&rft.jtitle=IEEE+access&rft.au=Qu%2C+Mingcheng&rft.au=Cui%2C+Jianxun&rft.au=Nie%2C+Yuxi&rft.au=Su%2C+Tonghua&rft.date=2021&rft.pub=IEEE&rft.eissn=2169-3536&rft.volume=9&rft.spage=141144&rft.epage=141153&rft_id=info:doi/10.1109%2FACCESS.2021.3115260&rft.externalDocID=9547267
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2169-3536&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2169-3536&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2169-3536&client=summon