面向ARM64架构多核微处理器的模板计算性能优化研究

模板计算是一类重要的计算核心，广泛存在于图像和视频处理以及大规模科学和工程计算领域。但是，针对ARM64高性能处理器的模板计算性能的优化研究还很少。为了实现典型模板计算核心在ARM64架构多核微处理器上的并行化和性能优化，基于AMCC XGENE2和飞腾FT1500A多核微处理器特点，提出了基于两维度绑定的优化方法，该方法通过线程与CPU绑定以及线程与数据块绑定，减少了线程调度的并行开销，增加了Cache的命中率。实验结果表明，该方法提升了模板计算在ARM64架构多核微处理器上的性能，且在两种ARM64架构多核微处理器平台上都表现出较好的可扩展性。...

Full description

Saved in:

Bibliographic Details
Published in	计算机工程与科学 Vol. 39; no. 5; pp. 829 - 833
Main Author	冯璐霞李春江黄亚斌
Format	Journal Article
Language	Chinese
Published	国防科学技术大学计算机学院,湖南长沙,410073 2017
Subjects	AMCC ARM64 FT-1500A X-GENE2 并行化模板计算线程绑定模板计算 ARM64 线程绑定并行化 AMCC X-GENE2 parallelism thread bound FT-1500A stencil computation
Online Access	Get full text
ISSN	1007-130X
DOI	10.3969/j.issn.1007-130X.2017.05.002

Cover

More Information
Summary:	模板计算是一类重要的计算核心，广泛存在于图像和视频处理以及大规模科学和工程计算领域。但是，针对ARM64高性能处理器的模板计算性能的优化研究还很少。为了实现典型模板计算核心在ARM64架构多核微处理器上的并行化和性能优化，基于AMCC XGENE2和飞腾FT1500A多核微处理器特点，提出了基于两维度绑定的优化方法，该方法通过线程与CPU绑定以及线程与数据块绑定，减少了线程调度的并行开销，增加了Cache的命中率。实验结果表明，该方法提升了模板计算在ARM64架构多核微处理器上的性能，且在两种ARM64架构多核微处理器平台上都表现出较好的可扩展性。
Bibliography:	Stencil computation is a class of important calculation kernels widely used in the field ran- ging from image and video processing to large-scale scientific and engineering simulation and calculation. However, the evaluation of stencil computation on the ARM64 high-performance processor is rare. Ac- cording to the features of AM-CC X-GENE2 and Phytium FT-1500A, we design an optimization method based on two-dimension bound, which reduces the parallelism overheads of thread scheduling,and in- creases the Cache hit rate by the thread-CPU bound and thread-data-block bound. Experimental results show that this method can improve the performance of the stencil calculation on ARM64 architecture, and the results of our kernel demonstrate the good scalability on the two ARM64 multi-core micropro- cessor platforms. 43-1258/TP FENG Lu-xia,LI Chun-jiang, HUANG Ya-bin （College of Computer, National University of Defense Technology,Changsha 410073,China） stencil computation; ARM64 ; AMCC X-GENE2 ; FT-1500A ; parallelism; threa
ISSN:	1007-130X
DOI:	10.3969/j.issn.1007-130X.2017.05.002