Illation of Video Visual Relation Detection Based on Graph Neural Network

Visual relation detection task is the bridge between semantic text and image information, and it can better express the content of images or video through the relation triple <subject, predicate, object>. This significant research can be applied to image question answering, video subtitles and...

Full description

Saved in:
Bibliographic Details
Published inIEEE access Vol. 9; pp. 141144 - 141153
Main Authors Qu, Mingcheng, Cui, Jianxun, Nie, Yuxi, Su, Tonghua
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN2169-3536
2169-3536
DOI10.1109/ACCESS.2021.3115260

Cover

Abstract Visual relation detection task is the bridge between semantic text and image information, and it can better express the content of images or video through the relation triple <subject, predicate, object>. This significant research can be applied to image question answering, video subtitles and other directions Using the video as input to the task of visual relationship detection receives less attention. Therefore, we propose an algorithm based on the graph convolution neural network and multi-hypothesis tree to implement video relationship prediction. Video visual relationship detection algorithm is divided into three steps: Firstly, the motion trajectories of the subject and object in the input video clip are generated; Secondly, a VRGE network module based on the graph convolution neural network is proposed to predict the relationship between objects in the video clip; Finally, the relationship triplets are formed through the multi-hypothesis fusion algorithm (MHF) and the visual relationship. We have verified our method on the benchmark ImageNet-VidVRD dataset. The experimental results demonstrate that our proposed method can achieve a satisfactory accuracy of 29.05% and recall of 10.18% for visual relation detection.
AbstractList Visual relation detection task is the bridge between semantic text and image information, and it can better express the content of images or video through the relation triple <subject, predicate, object>. This significant research can be applied to image question answering, video subtitles and other directions Using the video as input to the task of visual relationship detection receives less attention. Therefore, we propose an algorithm based on the graph convolution neural network and multi-hypothesis tree to implement video relationship prediction. Video visual relationship detection algorithm is divided into three steps: Firstly, the motion trajectories of the subject and object in the input video clip are generated; Secondly, a VRGE network module based on the graph convolution neural network is proposed to predict the relationship between objects in the video clip; Finally, the relationship triplets are formed through the multi-hypothesis fusion algorithm (MHF) and the visual relationship. We have verified our method on the benchmark ImageNet-VidVRD dataset. The experimental results demonstrate that our proposed method can achieve a satisfactory accuracy of 29.05% and recall of 10.18% for visual relation detection.
Visual relation detection task is the bridge between semantic text and image information, and it can better express the content of images or video through the relation triple . This significant research can be applied to image question answering, video subtitles and other directions Using the video as input to the task of visual relationship detection receives less attention. Therefore, we propose an algorithm based on the graph convolution neural network and multi-hypothesis tree to implement video relationship prediction. Video visual relationship detection algorithm is divided into three steps: Firstly, the motion trajectories of the subject and object in the input video clip are generated; Secondly, a VRGE network module based on the graph convolution neural network is proposed to predict the relationship between objects in the video clip; Finally, the relationship triplets are formed through the multi-hypothesis fusion algorithm (MHF) and the visual relationship. We have verified our method on the benchmark ImageNet-VidVRD dataset. The experimental results demonstrate that our proposed method can achieve a satisfactory accuracy of 29.05% and recall of 10.18% for visual relation detection.
Author Su, Tonghua
Cui, Jianxun
Qu, Mingcheng
Nie, Yuxi
Author_xml – sequence: 1
  givenname: Mingcheng
  surname: Qu
  fullname: Qu, Mingcheng
  organization: Department of Software, Harbin Institute of Technology, Harbin, China
– sequence: 2
  givenname: Jianxun
  surname: Cui
  fullname: Cui, Jianxun
  organization: Department of Software, Harbin Institute of Technology, Harbin, China
– sequence: 3
  givenname: Yuxi
  orcidid: 0000-0001-6468-6898
  surname: Nie
  fullname: Nie, Yuxi
  email: yuxi.nie@foxmail.com
  organization: Department of Software, Harbin Institute of Technology, Harbin, China
– sequence: 4
  givenname: Tonghua
  surname: Su
  fullname: Su, Tonghua
  organization: Department of Software, Harbin Institute of Technology, Harbin, China
BookMark eNqFUU1PAjEUbAwmIvILuGziGezH9uuIqEhCMBH12nS7D11cKXZ3Y_z3FpYQ48UeXievM_NeOueos_EbQGhA8IgQrK_Gk8ntcjmimJIRI4RTgU9QlxKhh4wz0fmFz1C_qtY4HhVbXHbRbFaWti78JvGr5KXIwcdaNbZMHuHwcAM1uD26thXkSQTTYLdvyQKaEIkLqL98eL9ApytbVtA_3D30fHf7NLkfzh-ms8l4PnQpVvUQMqswF1Ryl6YaO6VF5qRWoBiTudNUQqa5cynDihMMOreWMUWylXSCSMx6aNb65t6uzTYUHzZ8G28Ls2_48GpsqAtXgslzrngGRORKpwBOMZEpkFZm2lEhVPS6bL22wX82UNVm7ZuwiesbypXgmijBIou1LBd8VQVYHacSbHYRmDYCs4vAHCKIKv1H5Yp6_6N1sEX5j3bQagsAOE7TPJVUSPYD1hSUQQ
CODEN IAECCG
CitedBy_id crossref_primary_10_1016_j_neucom_2023_126274
Cites_doi 10.1109/CVPR.2017.733
10.1007/978-3-319-46448-0_51
10.1109/CVPR.2018.00611
10.1007/s11263-015-0816-y
10.1007/978-3-319-10602-1_48
10.1109/ICIP.2018.8451102
10.1109/CVPR.2016.95
10.1016/j.cosrev.2014.04.001
10.1109/TNN.2008.2005605
10.1109/CVPR42600.2020.01065
10.1109/CVPR.2009.5206848
10.1145/3343031.3351058
10.1109/CVPR.2015.7298641
10.1109/CVPR.2017.690
10.1109/CVPR.2017.331
10.1145/3123266.3123380
10.1109/CVPR.2019.00142
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021
DBID 97E
ESBDL
RIA
RIE
AAYXX
CITATION
7SC
7SP
7SR
8BQ
8FD
JG9
JQ2
L7M
L~C
L~D
DOA
DOI 10.1109/ACCESS.2021.3115260
DatabaseName IEEE Xplore (IEEE)
IEEE Xplore Open Access Journals
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Engineered Materials Abstracts
METADEX
Technology Research Database
Materials Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
Materials Research Database
Engineered Materials Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
METADEX
Computer and Information Systems Abstracts Professional
DatabaseTitleList

Materials Research Database
Database_xml – sequence: 1
  dbid: DOA
  name: Directory of Open Access Journals (DOAJ)
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 2169-3536
EndPage 141153
ExternalDocumentID oai_doaj_org_article_dd585be16d894eec836b8e7a7b9c2668
10_1109_ACCESS_2021_3115260
9547267
Genre orig-research
GrantInformation_xml – fundername: National Natural Science Foundation of China
  grantid: 61402131
  funderid: 10.13039/501100001809
GroupedDBID 0R~
4.4
5VS
6IK
97E
AAJGR
ABAZT
ABVLG
ACGFS
ADBBV
AGSQL
ALMA_UNASSIGNED_HOLDINGS
BCNDV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
EBS
EJD
ESBDL
GROUPED_DOAJ
IPLJI
JAVBF
KQ8
M43
M~E
O9-
OCL
OK1
RIA
RIE
RNS
AAYXX
CITATION
RIG
7SC
7SP
7SR
8BQ
8FD
JG9
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c408t-eba8056275c4490c896bc798e8337dc927eb95cc4308510e9daa3381bf7c61703
IEDL.DBID DOA
ISSN 2169-3536
IngestDate Wed Aug 27 01:22:12 EDT 2025
Mon Jun 30 03:59:56 EDT 2025
Tue Jul 01 04:20:38 EDT 2025
Thu Apr 24 23:11:24 EDT 2025
Wed Aug 27 02:28:58 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Language English
License https://creativecommons.org/licenses/by/4.0/legalcode
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c408t-eba8056275c4490c896bc798e8337dc927eb95cc4308510e9daa3381bf7c61703
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0001-6468-6898
OpenAccessLink https://doaj.org/article/dd585be16d894eec836b8e7a7b9c2668
PQID 2586591863
PQPubID 4845423
PageCount 10
ParticipantIDs doaj_primary_oai_doaj_org_article_dd585be16d894eec836b8e7a7b9c2668
crossref_citationtrail_10_1109_ACCESS_2021_3115260
proquest_journals_2586591863
ieee_primary_9547267
crossref_primary_10_1109_ACCESS_2021_3115260
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 20210000
2021-00-00
20210101
2021-01-01
PublicationDateYYYYMMDD 2021-01-01
PublicationDate_xml – year: 2021
  text: 20210000
PublicationDecade 2020
PublicationPlace Piscataway
PublicationPlace_xml – name: Piscataway
PublicationTitle IEEE access
PublicationTitleAbbrev Access
PublicationYear 2021
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref12
ref15
velickovic (ref2) 2018
ref14
lu (ref3) 2016; 9905
ren (ref5) 2015
lin (ref20) 2014; 8693
ref11
ref22
ref10
ref21
ref1
ref17
ref19
ref18
ref8
nam (ref9) 2016; abs 1608 7242
yang (ref16) 2018; 11205
ref4
ref6
redmon (ref7) 2018; abs 1804 2767
References_xml – ident: ref8
  doi: 10.1109/CVPR.2017.733
– volume: 9905
  start-page: 852
  year: 2016
  ident: ref3
  article-title: Visual relationship detection with language priors
  publication-title: Computer Vision-ECCV 2016
  doi: 10.1007/978-3-319-46448-0_51
– ident: ref13
  doi: 10.1109/CVPR.2018.00611
– start-page: 91
  year: 2015
  ident: ref5
  article-title: Faster R-CNN: Towards real-time object detection with region proposal networks
  publication-title: Proc Annu Conf Neural Inf Process Syst
– ident: ref18
  doi: 10.1007/s11263-015-0816-y
– volume: 8693
  start-page: 740
  year: 2014
  ident: ref20
  article-title: Microsoft COCO: Common objects in context
  publication-title: Computer Vision-ECCV 2014
  doi: 10.1007/978-3-319-10602-1_48
– ident: ref10
  doi: 10.1109/ICIP.2018.8451102
– volume: abs 1804 2767
  start-page: 1
  year: 2018
  ident: ref7
  article-title: YOLOv3: An incremental improvement
  publication-title: CoRR
– ident: ref19
  doi: 10.1109/CVPR.2016.95
– ident: ref1
  doi: 10.1016/j.cosrev.2014.04.001
– ident: ref14
  doi: 10.1109/TNN.2008.2005605
– ident: ref15
  doi: 10.1109/CVPR42600.2020.01065
– ident: ref17
  doi: 10.1109/CVPR.2009.5206848
– volume: abs 1608 7242
  start-page: 1
  year: 2016
  ident: ref9
  article-title: Modeling and propagating CNNs in a tree structure for visual tracking
  publication-title: CoRR
– ident: ref22
  doi: 10.1145/3343031.3351058
– ident: ref4
  doi: 10.1109/CVPR.2015.7298641
– ident: ref6
  doi: 10.1109/CVPR.2017.690
– start-page: 1
  year: 2018
  ident: ref2
  article-title: Graph attention networks
  publication-title: Proc 6th Int Conf Learn Represent (ICLR)
– volume: 11205
  start-page: 690
  year: 2018
  ident: ref16
  article-title: Graph R-CNN for scene graph generation
  publication-title: Proc 15th Eur Conf Comput Vis
– ident: ref12
  doi: 10.1109/CVPR.2017.331
– ident: ref21
  doi: 10.1145/3123266.3123380
– ident: ref11
  doi: 10.1109/CVPR.2019.00142
SSID ssj0000816957
Score 2.1985621
Snippet Visual relation detection task is the bridge between semantic text and image information, and it can better express the content of images or video through the...
SourceID doaj
proquest
crossref
ieee
SourceType Open Website
Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 141144
SubjectTerms Algorithms
Artificial neural networks
Deep learning
graph convolutional neural network
Graph neural networks
Hypotheses
Neural networks
Prediction algorithms
Predictive models
target detection
Target tracking
Task analysis
Trajectory
Video visual relation detection
Visual tasks
Visualization
SummonAdditionalLinks – databaseName: IEEE Electronic Library (IEL)
  dbid: RIE
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB61PcGBUgpioaAcODbbxHH8OLZb-kBqTxT1ZsXjiYSodhHNXvj1HTveUAFCXCIrGkeTfGN7xhl_A_CB50TLUUNfolBdKRF5HuybrlSNFsbLSA4ZzztfXauLG_nptr3dgsPpLAwRpeQzmsdm-pcfVriOW2VHtpVaKL0N22xm41mtaT8lFpCwrc7EQnVlj44XC34HDgFFPY-cMiLRUP5afBJHfy6q8sdMnJaXs1242ig2ZpV8m68HP8efv3E2_q_mz-FZ9jOL49Ew9mCLli_g6SP2wX24vLwbE-GKVV98-Rpoxdf7NffaZMgVpzSkVK1lccKrXSi4cR4ZrovI6cGC12MS-Uu4Ofv4eXFR5soKJcrKDCX5zkTPR7copa3QWOVRW0OmaXRAKzR52yLKJnpkFdnQdRzL1r7XGBncm1ews1wt6TUUUmHPwtix7yG7queAV8jQa0P8DKHsDMTmkzvMtOOx-sWdS-FHZd2Ik4s4uYzTDA6nTt9H1o1_i59ELCfRSJmdbjAGLo9AFwJHRp5qFYyVRGga5VnJTrPy7KWYGexH3KaHZMhmcLCxDJeH970TrVGtrY1q3vy911t4EhUc92oOYGf4saZ37L0M_n0y2wcjGenh
  priority: 102
  providerName: IEEE
Title Illation of Video Visual Relation Detection Based on Graph Neural Network
URI https://ieeexplore.ieee.org/document/9547267
https://www.proquest.com/docview/2586591863
https://doaj.org/article/dd585be16d894eec836b8e7a7b9c2668
Volume 9
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3PS8MwFA6ykx5EneJ0jhw8WtemaX4ct-ncBHdyslto0hSE0Ynr_n9fkm4OBL14KaW8pMnL68v7ysv3ELoFnygBNZSRISyPqDHgB8s0j1jKidDUkUO6884vMzaZ0-dFttgr9eVywgI9cFBcvyggoNU2YYWQ1FojUqaF5TnXErpn_phvLOM9MOV9sEiYzHhDM5TEsj8YjWBGAAhJcu8YZognpfzeijxjf1Ni5Ydf9pvN-AQdN1EiHoTRnaIDW52hoz3uwDaaTpchjQ2vSvz2XtgVXNcbaLXNb8MPtvaJVhUewl5VYLh5cvzU2DFygOAspICfo_n48XU0iZq6CJGhsagjq3Ph4haeGUplbIRk2nAprEhTXhhJuNUyM4amLp6KrSzyHJBooktuHP96eoFa1aqylwhTZkoQNjlEDjSPS4CrhBYlBw0D5Gayg8hWRco0pOGudsVSefAQSxX0qpxeVaPXDrrbNfoInBm_iw-d7neijvDaPwAzUI0ZqL_MoIPabuV2nciMcsJ4B3W3K6maj3OtSCZYJhPB0qv_ePU1OnTTCf9luqhVf27sDUQqte55o-z5Q4VfOEjiAQ
linkProvider Directory of Open Access Journals
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Nb9QwEB2VcgAOpaUgti0lhx6bbeI4_ji2C2UXuntqUW9WbE8kRLWLaPbCr-_Y8QYECHGJrGgcTfJsz4wzfgNwQmuipqihzR0TTc6do3WwrZpcVJIpywM5ZDjvPF-I6Q3_eFvfbsHpcBYGEWPyGY5DM_7L9yu3DltlZ7rmkgn5CB6T3ed1f1pr2FEJJSR0LRO1UFnos_PJhN6CgkBWjgOrDItElD_NT2TpT2VV_liLo4G5fA7zjWp9XsnX8bqzY_fjN9bG_9V9F3aSp5md90NjD7Zw-QKe_cI_uA-z2V2fCpet2uzzF48rut6vqdcmRy57h11M1lpmF2TvfEaND4HjOgusHiS46NPIX8LN5fvryTRPtRVyxwvV5WgbFXwfWTvOdeGUFtZJrVBVlfROM4lW187xKvhkBWrfNBTNlraVLnC4V69ge7la4mvIuHAtCbuGvA_eFC2FvIz7ViqkZzChR8A2n9y4RDwe6l_cmRiAFNr0OJmAk0k4jeB06PSt5934t_hFwHIQDaTZ8QZhYNIcNN5TbGSxFF5pjuhUJSwp2UhSnvwUNYL9gNvwkATZCI42I8OkCX5vWK1ErUslqoO_93oLT6bX8ytzNVt8OoSnQdl-5-YItrvva3xDvkxnj-MQfgCIFe0u
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Illation+of+Video+Visual+Relation+Detection+Based+on+Graph+Neural+Network&rft.jtitle=IEEE+access&rft.au=Qu%2C+Mingcheng&rft.au=Cui%2C+Jianxun&rft.au=Nie%2C+Yuxi&rft.au=Su%2C+Tonghua&rft.date=2021&rft.pub=IEEE&rft.eissn=2169-3536&rft.volume=9&rft.spage=141144&rft.epage=141153&rft_id=info:doi/10.1109%2FACCESS.2021.3115260&rft.externalDocID=9547267
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2169-3536&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2169-3536&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2169-3536&client=summon