A Switchable Deep Learning Approach for In-Loop Filtering in Video Coding

Deep learning provides a great potential for in-loop filtering to improve both coding efficiency and subjective quality in video coding. State-of-the-art work focuses on network structure design and employs a single powerful network to solve all problems. In contrast, this paper proposes a deep lear...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on circuits and systems for video technology Vol. 30; no. 7; pp. 1871 - 1887
Main Authors Ding, Dandan, Kong, Lingyi, Chen, Guangyao, Liu, Zoe, Fang, Yong
Format Journal Article
LanguageEnglish
Published New York IEEE 01.07.2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Deep learning provides a great potential for in-loop filtering to improve both coding efficiency and subjective quality in video coding. State-of-the-art work focuses on network structure design and employs a single powerful network to solve all problems. In contrast, this paper proposes a deep learning based systematic approach that includes an effective Convolutional Neural Network (CNN) structure, a hierarchical training strategy, and a video codec oriented switchable mechanism. First, we propose a novel CNN structure, i.e., Squeeze-and-Excitation Filtering CNN (SEFCNN), as an optional in-loop filter. To capture the non-linear interaction between channels, the SEFCNN is comprised of two subnets, i.e., Feature EXtracting (FEX) subnet and Feature ENhancing (FEN) subnet. Then, we develop a hierarchical model training strategy to adapt the two subnets to different coding scenarios. For high-rate videos with small artifacts, we train a single global model using the FEX for all types of frames, whereas for low-rate videos with large artifacts, different models are trained using both FEX and FEN for different types of frames. Finally, we propose an adaptive enhancing mechanism which is switchable between the CNN-based and the conventional methods. We selectively apply the CNN model to some frames or some regions in a frame. Experimental results show that the proposed scheme outperforms state-of-the-art work in coding efficiency, while the computational complexity is acceptable after GPU acceleration.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1051-8215
1558-2205
DOI:10.1109/TCSVT.2019.2935508