Optimisation of HEVC motion estimation exploiting SAD and SSD GPU-based implementation

The new High-Efficiency Video Coding (HEVC) standard doubles the video compression ratio compared to the previous H.264/AVC at the same video quality and without any degradation. However, this important performance is achieved by increasing the encoder computational complexity. That's why HEVC...

Full description

Saved in:
Bibliographic Details
Published inIET image processing Vol. 12; no. 2; pp. 243 - 253
Main Authors Khemiri, Randa, Kibeya, Hassan, Sayadi, Fatma Ezahra, Bahri, Nejmeddine, Atri, Mohamed, Masmoudi, Nouri
Format Journal Article
LanguageEnglish
Published The Institution of Engineering and Technology 01.02.2018
Institution of Engineering and Technology
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The new High-Efficiency Video Coding (HEVC) standard doubles the video compression ratio compared to the previous H.264/AVC at the same video quality and without any degradation. However, this important performance is achieved by increasing the encoder computational complexity. That's why HEVC complexity is a crucial subject. The most time consuming and the most intensive computing part of HEVC is the motion estimation based principally on the sum of absolute differences (SAD) or the sum of square differences (SSD) algorithms. For these reasons, the authors proposed an implementation of these algorithms on a low cost NVIDIA GPU (graphics processing unit) using the Fermi architecture developed with Compute Unified Device Architecture language. The proposed algorithm is based on the parallel-difference and the parallel-reduction process. The investigational results show a significant speed-up in terms of execution time for most 64 × 64 pixel blocks. In fact, the proposed parallel algorithm permits a significant reduction in the execution time that reaches up to 56.17 and 30.4%, compared to the CPU, for SAD and SSD algorithms, respectively. This improvement proves that parallelising the algorithm with the new proposed reduction process for the Fermi-GPU generation leads to better results. These findings are based on a static study that determines the PU percentage utilisation for each dimension in the HEVC. This study shows that the larger PUs are the most utilised in temporal levels 3 and 4, which attain 84.56% for class E. This improvement is accompanied by an average peak signal-to-noise ratio loss of 0.095 dB and a decrease of 0.64% in terms of BitRate.
ISSN:1751-9659
1751-9667
DOI:10.1049/iet-ipr.2017.0474