Progressively Normalized Self-Attention Network for Video Polyp Segmentation

Existing video polyp segmentation(VPS) models typically employ convolutional neural networks (CNNs) to extract features. However, due to their limited receptive fields, CNNs cannot fully exploit the global temporal and spatial information in successive video frames, resulting in false positive segme...

Full description

Saved in:

Bibliographic Details
Published in	Medical Image Computing and Computer Assisted Intervention - MICCAI 2021 Vol. 12901; pp. 142 - 152
Main Authors	Ji, Ge-Peng, Chou, Yu-Cheng, Fan, Deng-Ping, Chen, Geng, Fu, Huazhu, Jha, Debesh, Shao, Ling
Format	Book Chapter
Language	English
Published	Switzerland Springer International Publishing AG 2021 Springer International Publishing
Series	Lecture Notes in Computer Science
Subjects	Colonoscopy Normalized self-attention Polyp segmentation
Online Access	Get full text
ISBN	3030871924 9783030871925
ISSN	0302-9743 1611-3349
DOI	10.1007/978-3-030-87193-2_14

Cover

Loading…

More Information
Summary:	Existing video polyp segmentation(VPS) models typically employ convolutional neural networks (CNNs) to extract features. However, due to their limited receptive fields, CNNs cannot fully exploit the global temporal and spatial information in successive video frames, resulting in false positive segmentation results. In this paper, we propose the novel PNS-Net (Progressively Normalized Self-attention Network), which can efficiently learn representations from polyp videos with real-time speed (∼ $$\sim $$ 140fps) on a single RTX 2080 GPU and no post-processing. Our PNS-Net is based solely on a basic normalized self-attention block, equipping with recurrence and CNNs entirely. Experiments on challenging VPS datasets demonstrate that the proposed PNS-Net achieves state-of-the-art performance. We also conduct extensive experiments to study the effectiveness of the channel split, soft-attention, and progressive learning strategy. We find that our PNS-Net works well under different settings, making it a promising solution to the VPS task.
Bibliography:	Original Abstract: Existing video polyp segmentation(VPS) models typically employ convolutional neural networks (CNNs) to extract features. However, due to their limited receptive fields, CNNs cannot fully exploit the global temporal and spatial information in successive video frames, resulting in false positive segmentation results. In this paper, we propose the novel PNS-Net (Progressively Normalized Self-attention Network), which can efficiently learn representations from polyp videos with real-time speed (∼\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sim $$\end{document}140fps) on a single RTX 2080 GPU and no post-processing. Our PNS-Net is based solely on a basic normalized self-attention block, equipping with recurrence and CNNs entirely. Experiments on challenging VPS datasets demonstrate that the proposed PNS-Net achieves state-of-the-art performance. We also conduct extensive experiments to study the effectiveness of the channel split, soft-attention, and progressive learning strategy. We find that our PNS-Net works well under different settings, making it a promising solution to the VPS task. G.-P. Ji and Y.-C. Chou—Contributed equally. Code: http://dpfan.net/pnsnet/. Electronic supplementary materialThe online version of this chapter (https://doi.org/10.1007/978-3-030-87193-2_14) contains supplementary material, which is available to authorized users.
ISBN:	3030871924 9783030871925
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-030-87193-2_14