IEMask R-CNN: Information-Enhanced Mask R-CNN

The instance segmentation task is relatively difficult in computer vision, which requires not only high-quality masks but also high-accuracy instance category classification. Mask R-CNN has been proven to be a feasible method. However, due to the Feature Pyramid Network (FPN) structure lack useful c...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on big data Vol. 9; no. 2; pp. 688 - 700
Main Authors Bi, Xiuli, Hu, Jinwu, Xiao, Bin, Li, Weisheng, Gao, Xinbo
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 01.04.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The instance segmentation task is relatively difficult in computer vision, which requires not only high-quality masks but also high-accuracy instance category classification. Mask R-CNN has been proven to be a feasible method. However, due to the Feature Pyramid Network (FPN) structure lack useful channel information, global information and low-level texture information, and mask branch cannot obtain useful local-global information, Mask R-CNN is prevented from obtaining high-quality masks and high-accuracy instance category classification. Therefore, we proposed the Information-enhanced Mask R-CNN, called IEMask R-CNN. In the FPN structure of IEMask R-CNN, the information-enhanced FPN will enhance the useful channel information and the global information of the feature maps to solve the issues that the high-level feature map loses useful channel information and inaccurate of instance category classification, meanwhile the bottom-up path enhancement with adaptive feature fusion will ultilize the precise positioning signal in the lower layer to enhance the feature pyramid. In the mask branch of IEMask R-CNN, an encoding-decoding mask head will strength local-global information to gain a high-quality mask. Without bells and whistles, IEMask R-CNN gains significant gains of about 2.60%, 4.00%, 3.17% over Mask R-CNN on MS COCO2017, Cityscapes and LVIS1.0 benchmarks respectively.
ISSN:2332-7790
2332-7790
2372-2096
DOI:10.1109/TBDATA.2022.3187413