Second-order encoding networks for semantic segmentation

Recently most of the state-of-the-art semantic segmentation methods have focused on context modeling for more accurate prediction. As real-world images often contain multiple objects and stuff, image features may have complex and multi-modal distributions. However, existing methods do not fully cons...

Full description

Saved in:

Bibliographic Details
Published in	Neurocomputing (Amsterdam) Vol. 445; pp. 50 - 60
Main Authors	Sun, Qiule, Zhang, Zhimin, Li, Peihua
Format	Journal Article
Language	English
Published	Elsevier B.V 20.07.2021
Subjects	Context modeling Covariance pooling Multi-modal distributions Semantic segmentation Multi-modal distributions Covariance pooling Semantic segmentation Context modeling
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Recently most of the state-of-the-art semantic segmentation methods have focused on context modeling for more accurate prediction. As real-world images often contain multiple objects and stuff, image features may have complex and multi-modal distributions. However, existing methods do not fully consider sch complex distributions, having limited capability for context modeling. Towards addressing this problem, this paper proposes a second-order encoding network (SoENet) trainable end-to-end for harvesting complex contextual knowledge. At the core of SoENet is an encoding module which can capture second-order statistics in individual feature subspaces. Specifically, we divide the entire feature space into a set of subspaces (clusters) represented by codewords, in each of which a covariance matrix is computed for second-order statistical modeling. The covariance matrices of all subspaces are concatenated to form a 3D tensor, which is then subject to convolutions and nonlinear activations and finally used for scaling of input features. In this way, we can encode the context which involves the complex distribution into learning process in an end-to-end manner. The proposed SoENet is evaluated on four commonly used challenging benchmarks, i.e., PASCAL Context, PASCAL VOC 2012, ADE20K and Cityscapes. The experiments show that our network significantly outperforms its counterparts and is competitive compared to state-of-the-art methods.
ISSN:	0925-2312 1872-8286
DOI:	10.1016/j.neucom.2021.03.003