Parallelism Analysis of Prominent Desktop Applications: An 18- Year Perspective

Improvements in clock speed and exploitation of Instruction-Level Parallelism (ILP) hit a roadblock during mid-2000s. This, coupled with the demise of Dennard scaling, led to the rise of multi-core machines. Today, multi-core processors are ubiquitous and architects have moved to specialization to w...

Full description

Saved in:
Bibliographic Details
Published in2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) pp. 202 - 211
Main Authors Feng, Siying, Pal, Subhankar, Yang, Yichen, Dreslinski, Ronald G.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.03.2019
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Improvements in clock speed and exploitation of Instruction-Level Parallelism (ILP) hit a roadblock during mid-2000s. This, coupled with the demise of Dennard scaling, led to the rise of multi-core machines. Today, multi-core processors are ubiquitous and architects have moved to specialization to work around the walls hit by single-core performance and chip Thermal Design Power (TDP). The pressure of innovation in the aftermath of Dennard scaling is shifting to software developers, who are required to write programs that make the most effective use of underlying hardware. This work presents quantitative and qualitative analyses of how software has evolved to reap the benefits of multi-core and heterogeneous computers, compared to state-of-the-art systems in 2000 and 2010. We study a wide spectrum of commonly-used applications on a state-of-the-art desktop machine and analyze two important metrics, Thread-Level Parallelism (TLP) and GPU utilization. We compare the results to prior work over the last two decades, which state that 2-3 CPU cores are sufficient for most applications and that the GPU is usually under-utilized. Our analyses show that the harnessed parallelism has improved and emerging workloads show good utilization of hardware resources. The average TLP across the applications we study is 3.1, with most applications attaining the maximum instantaneous TLP of 12 during execution. The GPU is over-provisioned for most applications, but workloads such as cryptocurrency mining utilize it to the fullest. Overall, we conclude that the effectiveness of software in utilizing the underlying hardware has improved, but still has scope for optimizations.
DOI:10.1109/ISPASS.2019.00033