Optimisation of HEVC motion estimation exploiting SAD and SSD GPU-based implementation

The new High-Efficiency Video Coding (HEVC) standard doubles the video compression ratio compared to the previous H.264/AVC at the same video quality and without any degradation. However, this important performance is achieved by increasing the encoder computational complexity. That's why HEVC...

Full description

Saved in:

Bibliographic Details
Published in	IET image processing Vol. 12; no. 2; pp. 243 - 253
Main Authors	Khemiri, Randa, Kibeya, Hassan, Sayadi, Fatma Ezahra, Bahri, Nejmeddine, Atri, Mohamed, Masmoudi, Nouri
Format	Journal Article
Language	English
Published	The Institution of Engineering and Technology 01.02.2018 Institution of Engineering and Technology
Subjects	compute unified device architecture language Computer Science Fermi architecture graphics processing unit graphics processing units Hardware Architecture HEVC complexity HEVC motion estimation HEVC standard motion estimation NVIDIA GPU Research Article SAD signal‐to‐noise ratio loss SSD GPU‐based implementation sum of square differences video coding HEVC standard HEVC complexity sum of square differences SSD GPU-based implementation Fermi architecture HEVC motion estimation video coding signal-to-noise ratio loss graphics processing units compute unified device architecture language NVIDIA GPU SAD motion estimation graphics processing unit
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The new High-Efficiency Video Coding (HEVC) standard doubles the video compression ratio compared to the previous H.264/AVC at the same video quality and without any degradation. However, this important performance is achieved by increasing the encoder computational complexity. That's why HEVC complexity is a crucial subject. The most time consuming and the most intensive computing part of HEVC is the motion estimation based principally on the sum of absolute differences (SAD) or the sum of square differences (SSD) algorithms. For these reasons, the authors proposed an implementation of these algorithms on a low cost NVIDIA GPU (graphics processing unit) using the Fermi architecture developed with Compute Unified Device Architecture language. The proposed algorithm is based on the parallel-difference and the parallel-reduction process. The investigational results show a significant speed-up in terms of execution time for most 64 × 64 pixel blocks. In fact, the proposed parallel algorithm permits a significant reduction in the execution time that reaches up to 56.17 and 30.4%, compared to the CPU, for SAD and SSD algorithms, respectively. This improvement proves that parallelising the algorithm with the new proposed reduction process for the Fermi-GPU generation leads to better results. These findings are based on a static study that determines the PU percentage utilisation for each dimension in the HEVC. This study shows that the larger PUs are the most utilised in temporal levels 3 and 4, which attain 84.56% for class E. This improvement is accompanied by an average peak signal-to-noise ratio loss of 0.095 dB and a decrease of 0.64% in terms of BitRate.
ISSN:	1751-9659 1751-9667
DOI:	10.1049/iet-ipr.2017.0474