MobileInst: Video Instance Segmentation on the Mobile
Video instance segmentation on mobile devices is an important yet very challenging edge AI problem. It mainly suffers from (1) heavy computation and memory costs for frame-by-frame pixel-level instance perception and (2) complicated heuristics for tracking objects. To address those issues, we presen...
Saved in:
Published in | arXiv.org |
---|---|
Main Authors | , , , , , , , , , , |
Format | Paper |
Language | English |
Published |
Ithaca
Cornell University Library, arXiv.org
18.12.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Video instance segmentation on mobile devices is an important yet very challenging edge AI problem. It mainly suffers from (1) heavy computation and memory costs for frame-by-frame pixel-level instance perception and (2) complicated heuristics for tracking objects. To address those issues, we present MobileInst, a lightweight and mobile-friendly framework for video instance segmentation on mobile devices. Firstly, MobileInst adopts a mobile vision transformer to extract multi-level semantic features and presents an efficient query-based dual-transformer instance decoder for mask kernels and a semantic-enhanced mask decoder to generate instance segmentation per frame. Secondly, MobileInst exploits simple yet effective kernel reuse and kernel association to track objects for video instance segmentation. Further, we propose temporal query passing to enhance the tracking ability for kernels. We conduct experiments on COCO and YouTube-VIS datasets to demonstrate the superiority of MobileInst and evaluate the inference latency on one single CPU core of Snapdragon 778G Mobile Platform, without other methods of acceleration. On the COCO dataset, MobileInst achieves 31.2 mask AP and 433 ms on the mobile CPU, which reduces the latency by 50% compared to the previous SOTA. For video instance segmentation, MobileInst achieves 35.0 AP on YouTube-VIS 2019 and 30.1 AP on YouTube-VIS 2021. Code will be available to facilitate real-world applications and future research. |
---|---|
AbstractList | Video instance segmentation on mobile devices is an important yet very challenging edge AI problem. It mainly suffers from (1) heavy computation and memory costs for frame-by-frame pixel-level instance perception and (2) complicated heuristics for tracking objects. To address those issues, we present MobileInst, a lightweight and mobile-friendly framework for video instance segmentation on mobile devices. Firstly, MobileInst adopts a mobile vision transformer to extract multi-level semantic features and presents an efficient query-based dual-transformer instance decoder for mask kernels and a semantic-enhanced mask decoder to generate instance segmentation per frame. Secondly, MobileInst exploits simple yet effective kernel reuse and kernel association to track objects for video instance segmentation. Further, we propose temporal query passing to enhance the tracking ability for kernels. We conduct experiments on COCO and YouTube-VIS datasets to demonstrate the superiority of MobileInst and evaluate the inference latency on one single CPU core of Snapdragon 778G Mobile Platform, without other methods of acceleration. On the COCO dataset, MobileInst achieves 31.2 mask AP and 433 ms on the mobile CPU, which reduces the latency by 50% compared to the previous SOTA. For video instance segmentation, MobileInst achieves 35.0 AP on YouTube-VIS 2019 and 30.1 AP on YouTube-VIS 2021. Code will be available to facilitate real-world applications and future research. |
Author | Cheng, Tianheng Yang, Shusheng Ying, Xiaowen Wang, Xinggang Gao, Dashan Zhang, Shuai Jiang, Haoyi Lyu, Jiancheng Li, Xin Zhang, Renhong Liu, Wenyu |
Author_xml | – sequence: 1 givenname: Renhong surname: Zhang fullname: Zhang, Renhong – sequence: 2 givenname: Tianheng surname: Cheng fullname: Cheng, Tianheng – sequence: 3 givenname: Shusheng surname: Yang fullname: Yang, Shusheng – sequence: 4 givenname: Haoyi surname: Jiang fullname: Jiang, Haoyi – sequence: 5 givenname: Shuai surname: Zhang fullname: Zhang, Shuai – sequence: 6 givenname: Jiancheng surname: Lyu fullname: Lyu, Jiancheng – sequence: 7 givenname: Xin surname: Li fullname: Li, Xin – sequence: 8 givenname: Xiaowen surname: Ying fullname: Ying, Xiaowen – sequence: 9 givenname: Dashan surname: Gao fullname: Gao, Dashan – sequence: 10 givenname: Wenyu surname: Liu fullname: Liu, Wenyu – sequence: 11 givenname: Xinggang surname: Wang fullname: Wang, Xinggang |
BookMark | eNrjYmDJy89LZWLgNDI2NtS1MDEy4mDgLS7OMjAwMDIzNzI1NeZkMPXNT8rMSfXMKy6xUgjLTEnNVwCxE_OSUxWCU9NzU_NKEksy8_MUgKgkI1UBopyHgTUtMac4lRdKczMou7mGOHvoFhTlF5amFpfEZ-WXFuUBpeKNzC2NjUyMDcwtjIlTBQCZATZS |
ContentType | Paper |
Copyright | 2023. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
Copyright_xml | – notice: 2023. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
DBID | 8FE 8FG ABJCF ABUWG AFKRA AZQEC BENPR BGLVJ CCPQU DWQXO HCIFZ L6V M7S PIMPY PQEST PQQKQ PQUKI PRINS PTHSS |
DatabaseName | ProQuest SciTech Collection ProQuest Technology Collection Materials Science & Engineering Collection ProQuest Central (Alumni) ProQuest Central ProQuest Central Essentials ProQuest Central Technology Collection ProQuest One Community College ProQuest Central SciTech Premium Collection ProQuest Engineering Collection ProQuest Engineering Database ProQuest - Publicly Available Content Database ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China Engineering Collection |
DatabaseTitle | Publicly Available Content Database Engineering Database Technology Collection ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central China ProQuest Central ProQuest Engineering Collection ProQuest One Academic UKI Edition ProQuest Central Korea Materials Science & Engineering Collection ProQuest One Academic Engineering Collection |
DatabaseTitleList | Publicly Available Content Database |
Database_xml | – sequence: 1 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Physics |
EISSN | 2331-8422 |
Genre | Working Paper/Pre-Print |
GroupedDBID | 8FE 8FG ABJCF ABUWG AFKRA ALMA_UNASSIGNED_HOLDINGS AZQEC BENPR BGLVJ CCPQU DWQXO FRJ HCIFZ L6V M7S M~E PIMPY PQEST PQQKQ PQUKI PRINS PTHSS |
ID | FETCH-proquest_journals_27932430783 |
IEDL.DBID | 8FG |
IngestDate | Wed Oct 16 12:53:51 EDT 2024 |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-proquest_journals_27932430783 |
OpenAccessLink | https://www.proquest.com/docview/2793243078?pq-origsite=%requestingapplication% |
PQID | 2793243078 |
PQPubID | 2050157 |
ParticipantIDs | proquest_journals_2793243078 |
PublicationCentury | 2000 |
PublicationDate | 20231218 |
PublicationDateYYYYMMDD | 2023-12-18 |
PublicationDate_xml | – month: 12 year: 2023 text: 20231218 day: 18 |
PublicationDecade | 2020 |
PublicationPlace | Ithaca |
PublicationPlace_xml | – name: Ithaca |
PublicationTitle | arXiv.org |
PublicationYear | 2023 |
Publisher | Cornell University Library, arXiv.org |
Publisher_xml | – name: Cornell University Library, arXiv.org |
SSID | ssj0002672553 |
Score | 3.5171223 |
SecondaryResourceType | preprint |
Snippet | Video instance segmentation on mobile devices is an important yet very challenging edge AI problem. It mainly suffers from (1) heavy computation and memory... |
SourceID | proquest |
SourceType | Aggregation Database |
SubjectTerms | Datasets Electronic devices Instance segmentation Kernels Semantics Tracking |
Title | MobileInst: Video Instance Segmentation on the Mobile |
URI | https://www.proquest.com/docview/2793243078 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfR3JSgMx9KEdBG91w6WWgF6DTprO0ougzFiFKcWN3kqSeSMe7NROvfrtvsRUD0Ihl5CQhPD2FeBc69jWCTM8NSi5NJclV5WWXKhKoNCRCV2JjWIUDZ_l_aQ_8Qa3xodVrmiiI9RlbayN_EIQIAlJEJlczT-47Rplvau-hcYmBKGthGczxfPbXxuLiGKSmHv_yKzjHXkbgrGa42IHNnC2C1su5NI0e9Avak0oab31A_byVmLN7pysZpA94uu7TwqaMRokpbGf7ftwlmdPN0O-umvqoaGZ_r29dwAtUuvxEFgaatKulEkIjWScoK5IcooV4ViKqVbpEXTWnXS8fvkEtm1jdBt4ESYdaC0Xn3hK7HOpu-6PuhBcZ6PxA82Kr-wbKSB56A |
link.rule.ids | 786,790,12792,21416,33406,33777,43633,43838 |
linkProvider | ProQuest |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LSwMxEB50F9GbT3xUDeg16KbpPrwISstWu0vRKr0tSTorHuzWbv3_TtZUD0Iht4QkhHl888gMwKXWka0TZnhiUHJpridclVpyoUqBQocmaEpsZHmYvsiHcWfsHG61S6tcysRGUE8qY33kV4IISUiiyPh29slt1ygbXXUtNNbBtyU3Yw_8u24-fPr1sogwIszc_idoG-3R2wZ_qGY434E1nO7CRpN0aeo96GSVJqa08fob9vo-wYr1G7RmkD3j24f7FjRlNAinsZ_l-3DR647uU748q3D0UBd_t28fgEeGPR4CSwJN9pUyMTGSjGLUJWGnSBGXJZholRxBa9VOx6unz2EzHWWDYtDPH09gy7ZJt2kYQdwCbzH_wlNSpgt95l7sG2lZe3Q |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=MobileInst%3A+Video+Instance+Segmentation+on+the+Mobile&rft.jtitle=arXiv.org&rft.au=Zhang%2C+Renhong&rft.au=Cheng%2C+Tianheng&rft.au=Yang%2C+Shusheng&rft.au=Jiang%2C+Haoyi&rft.date=2023-12-18&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422 |