MobileInst: Video Instance Segmentation on the Mobile

Video instance segmentation on mobile devices is an important yet very challenging edge AI problem. It mainly suffers from (1) heavy computation and memory costs for frame-by-frame pixel-level instance perception and (2) complicated heuristics for tracking objects. To address those issues, we presen...

Full description

Saved in:
Bibliographic Details
Published inarXiv.org
Main Authors Zhang, Renhong, Cheng, Tianheng, Yang, Shusheng, Jiang, Haoyi, Zhang, Shuai, Lyu, Jiancheng, Li, Xin, Ying, Xiaowen, Gao, Dashan, Liu, Wenyu, Wang, Xinggang
Format Paper
LanguageEnglish
Published Ithaca Cornell University Library, arXiv.org 18.12.2023
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Video instance segmentation on mobile devices is an important yet very challenging edge AI problem. It mainly suffers from (1) heavy computation and memory costs for frame-by-frame pixel-level instance perception and (2) complicated heuristics for tracking objects. To address those issues, we present MobileInst, a lightweight and mobile-friendly framework for video instance segmentation on mobile devices. Firstly, MobileInst adopts a mobile vision transformer to extract multi-level semantic features and presents an efficient query-based dual-transformer instance decoder for mask kernels and a semantic-enhanced mask decoder to generate instance segmentation per frame. Secondly, MobileInst exploits simple yet effective kernel reuse and kernel association to track objects for video instance segmentation. Further, we propose temporal query passing to enhance the tracking ability for kernels. We conduct experiments on COCO and YouTube-VIS datasets to demonstrate the superiority of MobileInst and evaluate the inference latency on one single CPU core of Snapdragon 778G Mobile Platform, without other methods of acceleration. On the COCO dataset, MobileInst achieves 31.2 mask AP and 433 ms on the mobile CPU, which reduces the latency by 50% compared to the previous SOTA. For video instance segmentation, MobileInst achieves 35.0 AP on YouTube-VIS 2019 and 30.1 AP on YouTube-VIS 2021. Code will be available to facilitate real-world applications and future research.
AbstractList Video instance segmentation on mobile devices is an important yet very challenging edge AI problem. It mainly suffers from (1) heavy computation and memory costs for frame-by-frame pixel-level instance perception and (2) complicated heuristics for tracking objects. To address those issues, we present MobileInst, a lightweight and mobile-friendly framework for video instance segmentation on mobile devices. Firstly, MobileInst adopts a mobile vision transformer to extract multi-level semantic features and presents an efficient query-based dual-transformer instance decoder for mask kernels and a semantic-enhanced mask decoder to generate instance segmentation per frame. Secondly, MobileInst exploits simple yet effective kernel reuse and kernel association to track objects for video instance segmentation. Further, we propose temporal query passing to enhance the tracking ability for kernels. We conduct experiments on COCO and YouTube-VIS datasets to demonstrate the superiority of MobileInst and evaluate the inference latency on one single CPU core of Snapdragon 778G Mobile Platform, without other methods of acceleration. On the COCO dataset, MobileInst achieves 31.2 mask AP and 433 ms on the mobile CPU, which reduces the latency by 50% compared to the previous SOTA. For video instance segmentation, MobileInst achieves 35.0 AP on YouTube-VIS 2019 and 30.1 AP on YouTube-VIS 2021. Code will be available to facilitate real-world applications and future research.
Author Cheng, Tianheng
Yang, Shusheng
Ying, Xiaowen
Wang, Xinggang
Gao, Dashan
Zhang, Shuai
Jiang, Haoyi
Lyu, Jiancheng
Li, Xin
Zhang, Renhong
Liu, Wenyu
Author_xml – sequence: 1
  givenname: Renhong
  surname: Zhang
  fullname: Zhang, Renhong
– sequence: 2
  givenname: Tianheng
  surname: Cheng
  fullname: Cheng, Tianheng
– sequence: 3
  givenname: Shusheng
  surname: Yang
  fullname: Yang, Shusheng
– sequence: 4
  givenname: Haoyi
  surname: Jiang
  fullname: Jiang, Haoyi
– sequence: 5
  givenname: Shuai
  surname: Zhang
  fullname: Zhang, Shuai
– sequence: 6
  givenname: Jiancheng
  surname: Lyu
  fullname: Lyu, Jiancheng
– sequence: 7
  givenname: Xin
  surname: Li
  fullname: Li, Xin
– sequence: 8
  givenname: Xiaowen
  surname: Ying
  fullname: Ying, Xiaowen
– sequence: 9
  givenname: Dashan
  surname: Gao
  fullname: Gao, Dashan
– sequence: 10
  givenname: Wenyu
  surname: Liu
  fullname: Liu, Wenyu
– sequence: 11
  givenname: Xinggang
  surname: Wang
  fullname: Wang, Xinggang
BookMark eNrjYmDJy89LZWLgNDI2NtS1MDEy4mDgLS7OMjAwMDIzNzI1NeZkMPXNT8rMSfXMKy6xUgjLTEnNVwCxE_OSUxWCU9NzU_NKEksy8_MUgKgkI1UBopyHgTUtMac4lRdKczMou7mGOHvoFhTlF5amFpfEZ-WXFuUBpeKNzC2NjUyMDcwtjIlTBQCZATZS
ContentType Paper
Copyright 2023. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: 2023. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID 8FE
8FG
ABJCF
ABUWG
AFKRA
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
HCIFZ
L6V
M7S
PIMPY
PQEST
PQQKQ
PQUKI
PRINS
PTHSS
DatabaseName ProQuest SciTech Collection
ProQuest Technology Collection
Materials Science & Engineering Collection
ProQuest Central (Alumni)
ProQuest Central
ProQuest Central Essentials
ProQuest Central
Technology Collection
ProQuest One Community College
ProQuest Central
SciTech Premium Collection
ProQuest Engineering Collection
ProQuest Engineering Database
ProQuest - Publicly Available Content Database
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
Engineering Collection
DatabaseTitle Publicly Available Content Database
Engineering Database
Technology Collection
ProQuest Central Essentials
ProQuest One Academic Eastern Edition
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Technology Collection
ProQuest SciTech Collection
ProQuest Central China
ProQuest Central
ProQuest Engineering Collection
ProQuest One Academic UKI Edition
ProQuest Central Korea
Materials Science & Engineering Collection
ProQuest One Academic
Engineering Collection
DatabaseTitleList Publicly Available Content Database
Database_xml – sequence: 1
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Physics
EISSN 2331-8422
Genre Working Paper/Pre-Print
GroupedDBID 8FE
8FG
ABJCF
ABUWG
AFKRA
ALMA_UNASSIGNED_HOLDINGS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
FRJ
HCIFZ
L6V
M7S
M~E
PIMPY
PQEST
PQQKQ
PQUKI
PRINS
PTHSS
ID FETCH-proquest_journals_27932430783
IEDL.DBID 8FG
IngestDate Wed Oct 16 12:53:51 EDT 2024
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-proquest_journals_27932430783
OpenAccessLink https://www.proquest.com/docview/2793243078?pq-origsite=%requestingapplication%
PQID 2793243078
PQPubID 2050157
ParticipantIDs proquest_journals_2793243078
PublicationCentury 2000
PublicationDate 20231218
PublicationDateYYYYMMDD 2023-12-18
PublicationDate_xml – month: 12
  year: 2023
  text: 20231218
  day: 18
PublicationDecade 2020
PublicationPlace Ithaca
PublicationPlace_xml – name: Ithaca
PublicationTitle arXiv.org
PublicationYear 2023
Publisher Cornell University Library, arXiv.org
Publisher_xml – name: Cornell University Library, arXiv.org
SSID ssj0002672553
Score 3.5171223
SecondaryResourceType preprint
Snippet Video instance segmentation on mobile devices is an important yet very challenging edge AI problem. It mainly suffers from (1) heavy computation and memory...
SourceID proquest
SourceType Aggregation Database
SubjectTerms Datasets
Electronic devices
Instance segmentation
Kernels
Semantics
Tracking
Title MobileInst: Video Instance Segmentation on the Mobile
URI https://www.proquest.com/docview/2793243078
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfR3JSgMx9KEdBG91w6WWgF6DTprO0ougzFiFKcWN3kqSeSMe7NROvfrtvsRUD0Ihl5CQhPD2FeBc69jWCTM8NSi5NJclV5WWXKhKoNCRCV2JjWIUDZ_l_aQ_8Qa3xodVrmiiI9RlbayN_EIQIAlJEJlczT-47Rplvau-hcYmBKGthGczxfPbXxuLiGKSmHv_yKzjHXkbgrGa42IHNnC2C1su5NI0e9Avak0oab31A_byVmLN7pysZpA94uu7TwqaMRokpbGf7ftwlmdPN0O-umvqoaGZ_r29dwAtUuvxEFgaatKulEkIjWScoK5IcooV4ViKqVbpEXTWnXS8fvkEtm1jdBt4ESYdaC0Xn3hK7HOpu-6PuhBcZ6PxA82Kr-wbKSB56A
link.rule.ids 786,790,12792,21416,33406,33777,43633,43838
linkProvider ProQuest
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LSwMxEB50F9GbT3xUDeg16KbpPrwISstWu0vRKr0tSTorHuzWbv3_TtZUD0Iht4QkhHl888gMwKXWka0TZnhiUHJpridclVpyoUqBQocmaEpsZHmYvsiHcWfsHG61S6tcysRGUE8qY33kV4IISUiiyPh29slt1ygbXXUtNNbBtyU3Yw_8u24-fPr1sogwIszc_idoG-3R2wZ_qGY434E1nO7CRpN0aeo96GSVJqa08fob9vo-wYr1G7RmkD3j24f7FjRlNAinsZ_l-3DR647uU748q3D0UBd_t28fgEeGPR4CSwJN9pUyMTGSjGLUJWGnSBGXJZholRxBa9VOx6unz2EzHWWDYtDPH09gy7ZJt2kYQdwCbzH_wlNSpgt95l7sG2lZe3Q
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=MobileInst%3A+Video+Instance+Segmentation+on+the+Mobile&rft.jtitle=arXiv.org&rft.au=Zhang%2C+Renhong&rft.au=Cheng%2C+Tianheng&rft.au=Yang%2C+Shusheng&rft.au=Jiang%2C+Haoyi&rft.date=2023-12-18&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422