面向ARM64架构多核微处理器的模板计算性能优化研究
模板计算是一类重要的计算核心,广泛存在于图像和视频处理以及大规模科学和工程计算领域。但是,针对ARM64高性能处理器的模板计算性能的优化研究还很少。为了实现典型模板计算核心在ARM64架构多核微处理器上的并行化和性能优化,基于AMCC XGENE2和飞腾FT1500A多核微处理器特点,提出了基于两维度绑定的优化方法,该方法通过线程与CPU绑定以及线程与数据块绑定,减少了线程调度的并行开销,增加了Cache的命中率。实验结果表明,该方法提升了模板计算在ARM64架构多核微处理器上的性能,且在两种ARM64架构多核微处理器平台上都表现出较好的可扩展性。...
Saved in:
Published in | 计算机工程与科学 Vol. 39; no. 5; pp. 829 - 833 |
---|---|
Main Author | |
Format | Journal Article |
Language | Chinese |
Published |
国防科学技术大学计算机学院,湖南长沙,410073
2017
|
Subjects | |
Online Access | Get full text |
ISSN | 1007-130X |
DOI | 10.3969/j.issn.1007-130X.2017.05.002 |
Cover
Summary: | 模板计算是一类重要的计算核心,广泛存在于图像和视频处理以及大规模科学和工程计算领域。但是,针对ARM64高性能处理器的模板计算性能的优化研究还很少。为了实现典型模板计算核心在ARM64架构多核微处理器上的并行化和性能优化,基于AMCC XGENE2和飞腾FT1500A多核微处理器特点,提出了基于两维度绑定的优化方法,该方法通过线程与CPU绑定以及线程与数据块绑定,减少了线程调度的并行开销,增加了Cache的命中率。实验结果表明,该方法提升了模板计算在ARM64架构多核微处理器上的性能,且在两种ARM64架构多核微处理器平台上都表现出较好的可扩展性。 |
---|---|
Bibliography: | Stencil computation is a class of important calculation kernels widely used in the field ran- ging from image and video processing to large-scale scientific and engineering simulation and calculation. However, the evaluation of stencil computation on the ARM64 high-performance processor is rare. Ac- cording to the features of AM-CC X-GENE2 and Phytium FT-1500A, we design an optimization method based on two-dimension bound, which reduces the parallelism overheads of thread scheduling,and in- creases the Cache hit rate by the thread-CPU bound and thread-data-block bound. Experimental results show that this method can improve the performance of the stencil calculation on ARM64 architecture, and the results of our kernel demonstrate the good scalability on the two ARM64 multi-core micropro- cessor platforms. 43-1258/TP FENG Lu-xia,LI Chun-jiang, HUANG Ya-bin (College of Computer, National University of Defense Technology,Changsha 410073,China) stencil computation; ARM64 ; AMCC X-GENE2 ; FT-1500A ; parallelism; threa |
ISSN: | 1007-130X |
DOI: | 10.3969/j.issn.1007-130X.2017.05.002 |