A Switchable Deep Learning Approach for In-Loop Filtering in Video Coding

Deep learning provides a great potential for in-loop filtering to improve both coding efficiency and subjective quality in video coding. State-of-the-art work focuses on network structure design and employs a single powerful network to solve all problems. In contrast, this paper proposes a deep lear...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on circuits and systems for video technology Vol. 30; no. 7; pp. 1871 - 1887
Main Authors	Ding, Dandan, Kong, Lingyi, Chen, Guangyao, Liu, Zoe, Fang, Yong
Format	Journal Article
Language	English
Published	New York IEEE 01.07.2020 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Adaptation models Artificial neural networks CNN Codec Coding Correlation Deep learning Encoding enhancement Feature extraction Filtration Frames (data processing) in-loop filter Machine learning Structural hierarchy Training Video coding
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Deep learning provides a great potential for in-loop filtering to improve both coding efficiency and subjective quality in video coding. State-of-the-art work focuses on network structure design and employs a single powerful network to solve all problems. In contrast, this paper proposes a deep learning based systematic approach that includes an effective Convolutional Neural Network (CNN) structure, a hierarchical training strategy, and a video codec oriented switchable mechanism. First, we propose a novel CNN structure, i.e., Squeeze-and-Excitation Filtering CNN (SEFCNN), as an optional in-loop filter. To capture the non-linear interaction between channels, the SEFCNN is comprised of two subnets, i.e., Feature EXtracting (FEX) subnet and Feature ENhancing (FEN) subnet. Then, we develop a hierarchical model training strategy to adapt the two subnets to different coding scenarios. For high-rate videos with small artifacts, we train a single global model using the FEX for all types of frames, whereas for low-rate videos with large artifacts, different models are trained using both FEX and FEN for different types of frames. Finally, we propose an adaptive enhancing mechanism which is switchable between the CNN-based and the conventional methods. We selectively apply the CNN model to some frames or some regions in a frame. Experimental results show that the proposed scheme outperforms state-of-the-art work in coding efficiency, while the computational complexity is acceptable after GPU acceleration.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1051-8215 1558-2205
DOI:	10.1109/TCSVT.2019.2935508