Efficient motion estimation and discrete cosine transform implementation using the graphics processing units

Motion Estimation (ME) and the two-dimensional (2D) discrete cosine transform (2D-DCT) are both computationally expensive parts of HEVC standard, therefore real-time performance of the HEVC may not be free from glitches. To address this issue, this study deploys the graphics processing units (GPUs)...

Full description

Saved in:
Bibliographic Details
Published inPloS one Vol. 19; no. 8; p. e0307217
Main Authors Agha, Shahrukh, Jan, Farmanullah, Khan, Haroon Ahmed, Kaleem, Muhammad, Khan, Mansoor
Format Journal Article
LanguageEnglish
Published United States Public Library of Science 28.08.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Motion Estimation (ME) and the two-dimensional (2D) discrete cosine transform (2D-DCT) are both computationally expensive parts of HEVC standard, therefore real-time performance of the HEVC may not be free from glitches. To address this issue, this study deploys the graphics processing units (GPUs) to perform the ME and 2D-DCT tasks. In this concern, authors probed into four levels of parallelism (i.e., frame, macroblock, search area, and sum of the absolute difference (SAD) levels) existing in ME. For comparative analysis, authors involved full search (FS), test zone search (TZS) of HEVC, and hierarchical diamond search (EHDS) ME algorithms. Similarly, two levels of parallelism (i.e., macroblock and sub-macroblock) are also explored in 2D-DCT. Notably, the least computationally complex multithreaded Loeffler DCT algorithm is utilized for computing 2D-DCT. Experimental results show that ME processing task corresponding to 25 frames, with each frame of size (3840×2160) pixels, is accomplished in 0.15 seconds on the NVIDIA GeForce GTX 1080, whereas the 2D-DCT task along with the image reconstruction and differencing corresponding to 25 frames took 0.1 seconds. Collectively, both ME and 2D-DCT tasks are processed in 0.25 seconds, which still leaves enough room for the encoder's remaining parts to be executed within one second. Due to this enhancement, the resultant encoder can safely be used in real-time applications.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Competing Interests: The authors have declared that no competing interests exist.
ISSN:1932-6203
1932-6203
DOI:10.1371/journal.pone.0307217