Memory bandwidth optimization of SpMV on GPGPUs

It is an important task to improve performance for sparse matrix vector multiplication （SpMV）, and it is a difficult task because of its irregular memory access. Gen- eral purpose GPU （GPGPU） provides high computing abil- ity and substantial bandwidth that cannot be fully exploited by SpMV due to it...

Full description

Saved in:

Bibliographic Details
Published in	Frontiers of Computer Science Vol. 9; no. 3; pp. 431 - 441
Main Authors	YAN, Chenggang Clarence, YU, Hui, XU, Weizhi, ZHANG, Yingping, CHEN, Bochuan, TIAN, Zhu, WANG, Yuxuan, YIN, Jian
Format	Journal Article
Language	English
Published	Beijing Higher Education Press 01.06.2015 Springer Nature B.V
Subjects	Bandwidths cache blocking Computer architecture Computer Science Format GeForce GPGPU GPU Graphics processing units Mathematical analysis memory bandwidth Performance enhancement performance tuning Research Article Sparse matrices Sparsity SpMV 内存带宽存储器存储格式带宽优化稀疏矩阵高速缓存 memory bandwidth SpMV cache blocking performance tuning GPGPU
Online Access	Get full text

Cover

Loading…

Abstract	It is an important task to improve performance for sparse matrix vector multiplication （SpMV）, and it is a difficult task because of its irregular memory access. Gen- eral purpose GPU （GPGPU） provides high computing abil- ity and substantial bandwidth that cannot be fully exploited by SpMV due to its irregularity. In this paper, we propose two novel methods to optimize the memory bandwidth for SpMV on GPGPU. First, a new storage format is proposed to exploit memory bandwidth of GPU architecture more effi- ciently. The new storage format can ensure that there are as many non-zeros as possible in the format which is suitable to exploit the memory bandwidth of the GPU. Second, we pro- pose a cache blocking method to improve the performance of SpMV on GPU architecture. The sparse matrix is partitioned into sub-blocks that are stored in CSR format. With the block- ing method, the corresponding part of vector x can be reused in the GPU cache, so the time to access the global memory for vector x is reduced heavily. Experiments are carried out on three GPU platforms, GeForce 9800 GX2, GeForce GTX 480, and Tesla K40. Experimental results show that both new methods can efficiently improve the utilization of GPU mem- ory bandwidth and the performance of the GPU.
AbstractList	It is an important task to improve performance for sparse matrix vector multiplication (SpMV), and it is a difficult task because of its irregular memory access. General purpose GPU (GPGPU) provides high computing ability and substantial bandwidth that cannot be fully exploited by SpMV due to its irregularity. In this paper, we propose two novel methods to optimize the memory bandwidth for SpMV on GPGPU. First, a new storage format is proposed to exploit memory bandwidth of GPU architecture more efficiently. The new storage format can ensure that there are as many non-zeros as possible in the format which is suitable to exploit the memory bandwidth of the GPU. Second, we propose a cache blocking method to improve the performance of SpMV on GPU architecture. The sparse matrix is partitioned into sub-blocks that are stored in CSR format.With the blocking method, the corresponding part of vector x can be reused in the GPU cache, so the time to access the global memory for vector x is reduced heavily. Experiments are carried out on three GPU platforms, GeForce 9800 GX2, GeForce GTX 480, and Tesla K40. Experimental results show that both new methods can efficiently improve the utilization of GPU memory bandwidth and the performance of the GPU. It is an important task to improve performance for sparse matrix vector multiplication （SpMV）, and it is a difficult task because of its irregular memory access. Gen- eral purpose GPU （GPGPU） provides high computing abil- ity and substantial bandwidth that cannot be fully exploited by SpMV due to its irregularity. In this paper, we propose two novel methods to optimize the memory bandwidth for SpMV on GPGPU. First, a new storage format is proposed to exploit memory bandwidth of GPU architecture more effi- ciently. The new storage format can ensure that there are as many non-zeros as possible in the format which is suitable to exploit the memory bandwidth of the GPU. Second, we pro- pose a cache blocking method to improve the performance of SpMV on GPU architecture. The sparse matrix is partitioned into sub-blocks that are stored in CSR format. With the block- ing method, the corresponding part of vector x can be reused in the GPU cache, so the time to access the global memory for vector x is reduced heavily. Experiments are carried out on three GPU platforms, GeForce 9800 GX2, GeForce GTX 480, and Tesla K40. Experimental results show that both new methods can efficiently improve the utilization of GPU mem- ory bandwidth and the performance of the GPU. It is an important task to improve performance for sparse matrix vector multiplication (SpMV), and it is a difficult task because of its irregular memory access. General purpose GPU (GPGPU) provides high computing ability and substantial bandwidth that cannot be fully exploited by SpMV due to its irregularity. In this paper, we propose two novel methods to optimize the memory bandwidth for SpMV on GPGPU. First, a new storage format is proposed to exploit memory bandwidth of GPU architecture more efficiently. The new storage format can ensure that there are as many non-zeros as possible in the format which is suitable to exploit the memory bandwidth of the GPU. Second, we propose a cache blocking method to improve the performance of SpMV on GPU architecture. The sparse matrix is partitioned into sub-blocks that are stored in CSR format.With the blocking method, the corresponding part of vector x can be reused in the GPU cache, so the time to access the global memory for vector x is reduced heavily. Experiments are carried out on three GPU platforms, GeForce 9800 GX2, GeForce GTX 480, and Tesla K40. Experimental results show that both new methods can efficiently improve the utilization of GPU memory bandwidth and the performance of the GPU.
Author	Chenggang Clarence YAN Hui YU Weizhi XU Yingping ZHANG Bochuan CHEN Zhu TIAN Yuxuan WANG Jian YIN
AuthorAffiliation	Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China Institute of Microelectronics, Tsinghua University, Beijing 100084, China Automation Department, Tsinghua University, Beijing 100084, China State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China State Grid Information ＆ Communication Company of Hunan EPC, Changsha 410007, China Department of Computer, Shandong University, Weihai 250101, China
Author_xml	– sequence: 1 givenname: Chenggang Clarence surname: YAN fullname: YAN, Chenggang Clarence organization: Automation Department, Tsinghua University, Beijing 100084, China – sequence: 2 givenname: Hui surname: YU fullname: YU, Hui organization: Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China – sequence: 3 givenname: Weizhi surname: XU fullname: XU, Weizhi email: weizhixu@gmail.com organization: State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China – sequence: 4 givenname: Yingping surname: ZHANG fullname: ZHANG, Yingping organization: State Grid Information & Communication Company of Hunan EPC, Changsha 410007, China – sequence: 5 givenname: Bochuan surname: CHEN fullname: CHEN, Bochuan organization: State Grid Information & Communication Company of Hunan EPC, Changsha 410007, China – sequence: 6 givenname: Zhu surname: TIAN fullname: TIAN, Zhu organization: Department of Computer, Shandong University,Weihai 250101, China – sequence: 7 givenname: Yuxuan surname: WANG fullname: WANG, Yuxuan organization: Department of Computer, Shandong University,Weihai 250101, China – sequence: 8 givenname: Jian surname: YIN fullname: YIN, Jian organization: Department of Computer, Shandong University,Weihai 250101, China
BookMark	eNp9kEtPwzAQhC0EEqXwA7hFcA7dXTtxfEQIClIrkHhcrTyc1qiNgx2Eyq8nJQgkDj3tHubbmZ0jtt-4xjB2inCBAHISECWIGFDEAknGuMdGBCqJiXi6_7tTdshOQngFAAJKEqIRm8zN2vlNVORN9WGrbhm5trNr-5l31jWRq6PHdv4S9ev0YfrwHI7ZQZ2vgjn5mWP2fHP9dHUbz-6nd1eXs7gUKXUxZoqq0tQSwWQ8LxDSnENaARVSkcDCUEUyoaxEXguhZJWoVEEuBK8KwxUfs_Phbuvd27sJnX51777pLTUpzCQqLrBX4aAqvQvBm1q33q5zv9EIeluNHqrRfTV6W43eMvIfU9ru-9vO53a1k6SBDL1LszD-L9MuKBugpV0sjTdV600Iuva9nzV-N3r2k3TpmsVbb_n7XpqKTCAknH8BpPWT2g
CitedBy_id	crossref_primary_10_1002_cpe_8366 crossref_primary_10_1080_09728600_2022_2148589 crossref_primary_10_1007_s11227_024_05949_6 crossref_primary_10_1007_s11227_015_1571_0
Cites_doi	10.1109/IPDPS.2011.73 10.1109/LSP.2014.2310494 10.1145/2038037.1941587 10.1016/j.parco.2008.12.006 10.1109/ICCIS.2010.285 10.1109/TCSVT.2014.2380232 10.1109/TMM.2012.2190391 10.1145/1183401.1183444 10.1145/1837853.1693471 10.1007/978-3-642-11515-8_10 10.14778/1938545.1938548 10.1145/882262.882364 10.1049/el.2014.0611
ContentType	Journal Article
Copyright	Copyright reserved, 2014, Higher Education Press and Springer-Verlag Berlin Heidelberg Higher Education Press and Springer-Verlag Berlin Heidelberg 2015 Higher Education Press and Springer-Verlag Berlin Heidelberg 2015.
Copyright_xml	– notice: Copyright reserved, 2014, Higher Education Press and Springer-Verlag Berlin Heidelberg – notice: Higher Education Press and Springer-Verlag Berlin Heidelberg 2015 – notice: Higher Education Press and Springer-Verlag Berlin Heidelberg 2015.
DBID	2RA 92L CQIGP W92 ~WA AAYXX CITATION 8FE 8FG AFKRA ARAPS AZQEC BENPR BGLVJ CCPQU DWQXO GNUQQ HCIFZ JQ2 K7- P5Z P62 PHGZM PHGZT PKEHL PQEST PQGLB PQQKQ PQUKI
DOI	10.1007/s11704-014-4127-1
DatabaseName	维普期刊资源整合服务平台中文科技期刊数据库-CALIS站点中文科技期刊数据库-7.0平台中文科技期刊数据库-工程技术中文科技期刊数据库- 镜像站点 CrossRef ProQuest SciTech Collection ProQuest Technology Collection ProQuest Central UK/Ireland Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Central Technology Collection ProQuest One ProQuest Central Korea ProQuest Central Student SciTech Premium Collection ProQuest Computer Science Collection Computer Science Database Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Premium ProQuest One Academic (New) ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition
DatabaseTitle	CrossRef Advanced Technologies & Aerospace Collection Computer Science Database ProQuest Central Student Technology Collection ProQuest One Academic Middle East (New) ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection ProQuest One Academic Eastern Edition SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central Advanced Technologies & Aerospace Database ProQuest One Applied & Life Sciences ProQuest One Academic UKI Edition ProQuest Central Korea ProQuest Central (New) ProQuest One Academic ProQuest One Academic (New)
DatabaseTitleList	Advanced Technologies & Aerospace Collection
Database_xml	– sequence: 1 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
DocumentTitleAlternate	Memory bandwidth optimization of SpMV on GPGPUs
EISSN	2095-2236
EndPage	441
ExternalDocumentID	10_1007_s11704_014_4127_1 10.1007/s11704-014-4127-1 664841053
GroupedDBID	-EM .VR 06D 0VY 1-T 2J2 2JN 2JY 2KG 2KM 2LR 2RA 30V 4.4 406 408 40E 5VS 92L 95- 95. 96X AABHQ AAFGU AAIAL AAJKR AANZL AARHV AARTL AATLR AATNV AATVU AAUYE AAWCG AAYFA AAYIU AAYQN AAYTO ABDZT ABECU ABFGW ABFTD ABFTV ABHQN ABJNI ABJOX ABKAS ABKCH ABMQK ABNWP ABQBU ABSXP ABTEG ABTHY ABTKH ABTMW ABWNU ABXPI ACAOD ACBMV ACBRV ACBXY ACGFS ACHSB ACHXU ACIPQ ACKNC ACMDZ ACMLO ACOKC ACSNA ACTTH ACVWB ACWMK ACZOJ ADHIR ADINQ ADKNI ADKPE ADMDM ADOXG ADRFC ADTPH ADURQ ADYFF ADZKW AEBTG AEFTE AEGNC AEJHL AEJRE AEKMD AENEX AEOHA AEPYU AESKC AESTI AETLH AEVLU AEVTX AEXYK AFKRA AFLOW AFNRJ AFQWF AFWTZ AFZKB AGAYW AGDGC AGGBP AGJBK AGMZJ AGQMX AGWIL AGWZB AGYKE AHBYD AHKAY AHSBF AHYZX AIAKS AIIXL AILAN AIMYW AITGF AJBLW AJDOV AJRNO AJZVZ ALMA_UNASSIGNED_HOLDINGS ALWAN AMKLP AMXSW AMYLF AOCGG ARAPS ARMRJ AXYYD B-. BDATZ BENPR BGLVJ BGNMA CQIGP CSCUP DDRTE DNIVK DPUIP EBLON EBS EIOEI EJD ESBYG FERAY FFXSO FIGPU FINBP FNLPD FRRFC FSGXE FWDCC GGCAI GGRSB GJIRD GNWQR GQ6 GQ7 HCIFZ HF~ HG6 HMJXF HRMNR HZ~ IKXTQ IWAJR IXD I~Z J-C JBSCW JZLTJ K7- KOV LLZTM M4Y MA- NPVJJ NQJWS NU0 O9J P4S PF0 PT4 R89 ROL RSV S16 S3B SAP SCL SCO SHX SISQX SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE SZN TSG TUC UG4 UNUBA UOJIU UTJUX UZXMN VFIZW W48 W92 YLTOR Z7R Z7X Z81 Z83 Z88 ZMTXR ~WA AAEIZ AAPBV ADTIX C EM H13 HF HZ RIG VR 0R~ AACDK AAJBT AASML AAYZH ABAKF ACDTI ACPIV AEFQL AEMSY AFBBN AGQEE AGRTI AIGIU BSONS CCPQU SJYHP -SI -S~ AAPKM AAXDM AAYXX ABBRH ABDBE ABFSG ACSTC AEZWR AFDZB AFHIU AFOHR AHPBZ AHWEU AIXLP ATHPR AYFIA CAJEI CITATION PHGZM PHGZT Q-- U1G U5S 8FE 8FG ABRTQ AZQEC DWQXO GNUQQ JQ2 P62 PKEHL PQEST PQGLB PQQKQ PQUKI PUEGO
ID	FETCH-LOGICAL-c462t-1892dcef710e83ab106a306d02b79241be2d27528c13f4497d59690a443dbe393
IEDL.DBID	AGYKE
ISSN	2095-2228
IngestDate	Sat Aug 23 14:16:14 EDT 2025 Tue Jul 01 02:22:06 EDT 2025 Thu Apr 24 22:58:47 EDT 2025 Fri Feb 21 02:33:37 EST 2025 Thu Aug 18 16:19:20 EDT 2022 Wed Feb 14 10:31:24 EST 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	3
Keywords	memory bandwidth SpMV cache blocking performance tuning GPGPU
Language	English
License	This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c462t-1892dcef710e83ab106a306d02b79241be2d27528c13f4497d59690a443dbe393
Notes	11-5731/TP It is an important task to improve performance for sparse matrix vector multiplication （SpMV）, and it is a difficult task because of its irregular memory access. Gen- eral purpose GPU （GPGPU） provides high computing abil- ity and substantial bandwidth that cannot be fully exploited by SpMV due to its irregularity. In this paper, we propose two novel methods to optimize the memory bandwidth for SpMV on GPGPU. First, a new storage format is proposed to exploit memory bandwidth of GPU architecture more effi- ciently. The new storage format can ensure that there are as many non-zeros as possible in the format which is suitable to exploit the memory bandwidth of the GPU. Second, we pro- pose a cache blocking method to improve the performance of SpMV on GPU architecture. The sparse matrix is partitioned into sub-blocks that are stored in CSR format. With the block- ing method, the corresponding part of vector x can be reused in the GPU cache, so the time to access the global memory for vector x is reduced heavily. Experiments are carried out on three GPU platforms, GeForce 9800 GX2, GeForce GTX 480, and Tesla K40. Experimental results show that both new methods can efficiently improve the utilization of GPU mem- ory bandwidth and the performance of the GPU. GPGPU, performance tuning, SpMV, cacheblocking, memory bandwidth memory bandwidth SpMV Document received on :2014-05-15 cache blocking performance tuning Document accepted on :2014-06-10 GPGPU ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
OpenAccessLink	https://journal.hep.com.cn/fcs/EN/10.1007/s11704-014-4127-1
PQID	2918719341
PQPubID	2044369
PageCount	11
ParticipantIDs	proquest_journals_2918719341 crossref_primary_10_1007_s11704_014_4127_1 crossref_citationtrail_10_1007_s11704_014_4127_1 springer_journals_10_1007_s11704_014_4127_1 higheredpress_frontiers_10_1007_s11704_014_4127_1 chongqing_primary_664841053
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2015-06-01
PublicationDateYYYYMMDD	2015-06-01
PublicationDate_xml	– month: 06 year: 2015 text: 2015-06-01 day: 01
PublicationDecade	2010
PublicationPlace	Beijing
PublicationPlace_xml	– name: Beijing – name: Heidelberg
PublicationSubtitle	Selected Publications from Chinese Universities
PublicationTitle	Frontiers of Computer Science
PublicationTitleAbbrev	Front. Comput. Sci
PublicationTitleAlternate	Frontiers of Computer Science in China
PublicationYear	2015
Publisher	Higher Education Press Springer Nature B.V
Publisher_xml	– name: Higher Education Press – name: Springer Nature B.V
References	Yang, Parthasarathy, Sadayappan (CR23) 2011; 4 Zhang, Yan, Dai, Ma (CR6) 2012; 14 Guo, Wang (CR22) 2010 Yan, Zhang, Dai, Li (CR5) 2013 Yan, Zhang, Xu, Dai, Zhang, Dai, Wu (CR4) 2014; 99 Buluc, Williams, Oliker, Demmel (CR25) 2011 Kourtis, Karakasis, Goumas, Koziris (CR27) 2011; 46 Baskaran, Bordawekar (CR17) 2008 Nvidia (CR9) 2007 Bolz, Farmer, Grinspun, Schröoder (CR14) 2003; 22 Choi, Singh, Vuduc (CR21) 2010; 45 Buluç, Fineman, Frigo, Gilbert, Leiserson (CR26) 2009 Im (CR10) 2000 Cevahir, Nukada, Matsuoka (CR18) 2009 CR3 Yan, Zhang, Xu, Dai, Li, Dai, Wu (CR2) 2014; 21 Yan, Dai, Zhang, Ma, Chen, Fan, Zheng (CR7) 2011 Vuduc (CR11) 2003 Williams, Oliker, Vuduc, Shalf, Yelick, Demmel (CR13) 2009; 35 Bell, Garland (CR8) 2009 Sengupta, Harris, Zhang, Owens (CR15) 2007 Willcock, Lumsdaine (CR28) 2006 Xu, Liu, Wu, Ye, Jiao, Wang, Song, Fan (CR1) 2012 Williams (CR12) 2008 Bell, Garland (CR16) 2008 Xu, Liu, Fan, Jiao, Ye, Song, Yan (CR29) 2012; 6 Vázquez, Garzón, Martnez, Fernández (CR19) 2009 Monakov, Lokhmotov, Avetisyan (CR20) 2010 Xu, Zhang, Jiao, Wang, Song, Liu (CR24) 2012 S W Williams (4127_CR12) 2008 S Sengupta (4127_CR15) 2007 C G Yan (4127_CR2) 2014; 21 M M Baskaran (4127_CR17) 2008 W Xu (4127_CR1) 2012 S Williams (4127_CR13) 2009; 35 E Im (4127_CR10) 2000 J Bolz (4127_CR14) 2003; 22 C G Yan (4127_CR7) 2011 J W Choi (4127_CR21) 2010; 45 WZ Xu (4127_CR29) 2012; 6 C G Yan (4127_CR5) 2013 R W Vuduc (4127_CR11) 2003 A Buluc (4127_CR25) 2011 N Bell (4127_CR16) 2008 W Xu (4127_CR24) 2012 Y D Zhang (4127_CR6) 2012; 14 C Yan (4127_CR4) 2014; 99 J Willcock (4127_CR28) 2006 N Bell (4127_CR8) 2009 4127_CR3 A Buluç (4127_CR26) 2009 X Yang (4127_CR23) 2011; 4 P Guo (4127_CR22) 2010 F Vázquez (4127_CR19) 2009 K Kourtis (4127_CR27) 2011; 46 A Cevahir (4127_CR18) 2009 A Monakov (4127_CR20) 2010 C Nvidia (4127_CR9) 2007
References_xml	– start-page: 1081 year: 2009 end-page: 1092 ident: CR19 article-title: The sparse matrix vector product on GPUs publication-title: Proceedings of the 2009 International Conference on Computational and Mathematical Methods in Science and Engineering – year: 2008 ident: CR17 article-title: Optimizing Sparse Matrix-vector Multiplication on GPUs Using Compile-time and Run-time Strategies publication-title: IBM Reserach Report RC24704 (W0812-047) – start-page: 63 year: 2013 end-page: 72 ident: CR5 article-title: Highly parallel framework for HEVC motion estimation on many-core platform publication-title: Proceedings of Data Compression Conference – year: 2008 ident: CR16 article-title: Efficient Sparse Matrix-vector Multiplication on Cuda publication-title: Technical Report, NVIDIA Technical Report NVR-2008-004 – start-page: 231 year: 2012 end-page: 235 ident: CR24 article-title: Optimizing sparse matrix vector multiplication using cache blocking method on fermi GPU publication-title: Proceedings of the 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing (SNPD) – year: 2008 ident: CR12 article-title: Auto-tuning performance on multicore computers publication-title: Dissertation for the Doctoral Degree – start-page: 721 year: 2011 end-page: 733 ident: CR25 article-title: Reduced-bandwidth multithreaded algorithms for sparse matrix-vector multiplication publication-title: Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium doi: 10.1109/IPDPS.2011.73 – volume: 21 start-page: 573 issue: 5 year: 2014 end-page: 576 ident: CR2 article-title: A highly parallel framework for HEVC coding unit partitioning tree decision on many-core processors publication-title: IEEE Signal Processing letters doi: 10.1109/LSP.2014.2310494 – volume: 46 start-page: 247 issue: 8 year: 2011 end-page: 256 ident: CR27 article-title: Csx: an extended compression format for spmv on shared memory systems publication-title: ACM SIGPLAN Notices doi: 10.1145/2038037.1941587 – volume: 35 start-page: 178 issue: 3 year: 2009 end-page: 194 ident: CR13 article-title: Optimization of sparse matrix-vector multiplication on emerging multicore platforms publication-title: Parallel Computing doi: 10.1016/j.parco.2008.12.006 – year: 2003 ident: CR11 article-title: Automatic performance tuning of sparse matrix kernels publication-title: Dissertation for the Doctoral Degree – volume: 6 start-page: 71 issue: 1 year: 2012 end-page: 78 ident: CR29 article-title: Accelerating sparse matrix vector multiplication on many-core GPUs publication-title: World Academy of Science, Engineering and Technology – start-page: 1154 year: 2010 end-page: 1157 ident: CR22 article-title: Auto-tuning cuda parameters for sparse matrixvector multiplication on GPUs publication-title: Proceedings of the 2010 International Conference on Computational and Information Sciences (ICCIS) doi: 10.1109/ICCIS.2010.285 – volume: 99 start-page: 1 year: 2014 ident: CR4 article-title: Efficient parpallel framework for HEVC motion estimation on many-core processors publication-title: IEEE Transactions on Circuits and Systems for Video Technology doi: 10.1109/TCSVT.2014.2380232 – volume: 14 start-page: 510 issue: 3 year: 2012 end-page: 524 ident: CR6 article-title: Efficient parallel framework for H.264/AVC deblocking filter on many-core platform publication-title: IEEE Transactions on Multimedia doi: 10.1109/TMM.2012.2190391 – year: 2000 ident: CR10 article-title: Optimizing the performance of sparse matrix-vector multiplication publication-title: Dissertation for the Doctoral Degree – start-page: 1 year: 2011 end-page: 68 ident: CR7 article-title: Parallel deblocking filter for H.264/AVC implemented on Tile64 platform publication-title: Proceedings of the International Conference on Multimedia and Expo – ident: CR3 – start-page: 307 year: 2006 end-page: 316 ident: CR28 article-title: Accelerating sparse matrix computations via data compression publication-title: Proceedings of the 20th annual international conference on Supercomputing doi: 10.1145/1183401.1183444 – start-page: 233 year: 2009 end-page: 244 ident: CR26 article-title: Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks publication-title: Proceedings of the 21st annual symposium on Parallelism in algorithms and architectures – year: 2007 ident: CR9 publication-title: Compute Unified Device Architecture Programming Guide – start-page: 18 year: 2009 ident: CR8 article-title: Implementing sparse matrix-vector multiplication on throughput-oriented processors publication-title: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis – start-page: 30 year: 2012 end-page: 36 ident: CR1 article-title: Auto-tuning GEMV on many-core GPU publication-title: Proceedings of the 18th IEEE International Conference on Parallel and Distributed Systems – start-page: 97 year: 2007 end-page: 106 ident: CR15 article-title: Scan primitives for GPU computing publication-title: Proceedings of Graphics Hardware – volume: 45 start-page: 115 issue: 5 year: 2010 end-page: 126 ident: CR21 article-title: Model-driven autotuning of sparse matrix-vector multiply on GPUs publication-title: ACM Sigplan Notices doi: 10.1145/1837853.1693471 – start-page: 893 year: 2009 end-page: 903 ident: CR18 article-title: Fast conjugate gradients with multiple GPUs publication-title: Proceedings of the Computational Science – start-page: 111 year: 2010 end-page: 125 ident: CR20 article-title: Automatically tuning sparse matrix-vector multiplication for GPU architectures publication-title: Proceedings of the High Performance Embedded Architectures and Compilers doi: 10.1007/978-3-642-11515-8_10 – volume: 4 start-page: 231 issue: 4 year: 2011 end-page: 242 ident: CR23 article-title: Fast sparse matrix-vector multiplication on GPUs: implications for graph mining publication-title: Proceedings of the VLDB Endowment doi: 10.14778/1938545.1938548 – volume: 22 start-page: 917 issue: 3 year: 2003 end-page: 924 ident: CR14 article-title: Sparse matrix solvers on the GPU: conjugate gradients and multigrid publication-title: ACM Transactions on Graphics doi: 10.1145/882262.882364 – start-page: 18 volume-title: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis year: 2009 ident: 4127_CR8 – volume-title: Dissertation for the Doctoral Degree year: 2008 ident: 4127_CR12 – start-page: 63 volume-title: Proceedings of Data Compression Conference year: 2013 ident: 4127_CR5 – start-page: 893 volume-title: Proceedings of the Computational Science year: 2009 ident: 4127_CR18 – start-page: 30 volume-title: Proceedings of the 18th IEEE International Conference on Parallel and Distributed Systems year: 2012 ident: 4127_CR1 – volume-title: Dissertation for the Doctoral Degree year: 2000 ident: 4127_CR10 – volume: 35 start-page: 178 issue: 3 year: 2009 ident: 4127_CR13 publication-title: Parallel Computing doi: 10.1016/j.parco.2008.12.006 – volume: 99 start-page: 1 year: 2014 ident: 4127_CR4 publication-title: IEEE Transactions on Circuits and Systems for Video Technology – volume: 22 start-page: 917 issue: 3 year: 2003 ident: 4127_CR14 publication-title: ACM Transactions on Graphics doi: 10.1145/882262.882364 – start-page: 1 volume-title: Proceedings of the International Conference on Multimedia and Expo year: 2011 ident: 4127_CR7 – volume: 6 start-page: 71 issue: 1 year: 2012 ident: 4127_CR29 publication-title: World Academy of Science, Engineering and Technology – start-page: 97 volume-title: Proceedings of Graphics Hardware year: 2007 ident: 4127_CR15 – volume: 45 start-page: 115 issue: 5 year: 2010 ident: 4127_CR21 publication-title: ACM Sigplan Notices doi: 10.1145/1837853.1693471 – volume: 46 start-page: 247 issue: 8 year: 2011 ident: 4127_CR27 publication-title: ACM SIGPLAN Notices doi: 10.1145/2038037.1941587 – volume-title: IBM Reserach Report RC24704 (W0812-047) year: 2008 ident: 4127_CR17 – volume: 21 start-page: 573 issue: 5 year: 2014 ident: 4127_CR2 publication-title: IEEE Signal Processing letters doi: 10.1109/LSP.2014.2310494 – volume-title: Dissertation for the Doctoral Degree year: 2003 ident: 4127_CR11 – volume: 4 start-page: 231 issue: 4 year: 2011 ident: 4127_CR23 publication-title: Proceedings of the VLDB Endowment doi: 10.14778/1938545.1938548 – ident: 4127_CR3 doi: 10.1049/el.2014.0611 – volume: 14 start-page: 510 issue: 3 year: 2012 ident: 4127_CR6 publication-title: IEEE Transactions on Multimedia doi: 10.1109/TMM.2012.2190391 – start-page: 233 volume-title: Proceedings of the 21st annual symposium on Parallelism in algorithms and architectures year: 2009 ident: 4127_CR26 – volume-title: Compute Unified Device Architecture Programming Guide year: 2007 ident: 4127_CR9 – start-page: 1154 volume-title: Proceedings of the 2010 International Conference on Computational and Information Sciences (ICCIS) year: 2010 ident: 4127_CR22 doi: 10.1109/ICCIS.2010.285 – start-page: 231 volume-title: Proceedings of the 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing (SNPD) year: 2012 ident: 4127_CR24 – volume-title: Technical Report, NVIDIA Technical Report NVR-2008-004 year: 2008 ident: 4127_CR16 – start-page: 721 volume-title: Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium year: 2011 ident: 4127_CR25 doi: 10.1109/IPDPS.2011.73 – start-page: 111 volume-title: Proceedings of the High Performance Embedded Architectures and Compilers year: 2010 ident: 4127_CR20 doi: 10.1007/978-3-642-11515-8_10 – start-page: 307 volume-title: Proceedings of the 20th annual international conference on Supercomputing year: 2006 ident: 4127_CR28 doi: 10.1145/1183401.1183444 – start-page: 1081 volume-title: Proceedings of the 2009 International Conference on Computational and Mathematical Methods in Science and Engineering year: 2009 ident: 4127_CR19
SSID	ssj0002025522
Score	2.0113926
Snippet	It is an important task to improve performance for sparse matrix vector multiplication （SpMV）, and it is a difficult task because of its irregular memory... It is an important task to improve performance for sparse matrix vector multiplication (SpMV), and it is a difficult task because of its irregular memory...
SourceID	proquest crossref springer higheredpress chongqing
SourceType	Aggregation Database Enrichment Source Index Database Publisher
StartPage	431
SubjectTerms	Bandwidths cache blocking Computer architecture Computer Science Format GeForce GPGPU GPU Graphics processing units Mathematical analysis memory bandwidth Performance enhancement performance tuning Research Article Sparse matrices Sparsity SpMV 内存带宽存储器存储格式带宽优化稀疏矩阵高速缓存
SummonAdditionalLinks	– databaseName: ProQuest Central dbid: BENPR link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LSwMxEA4-LoL4FuuLPXhSgs1rd3MSFR8IlaJWvIXNY62gu62tiP_eyTbbUsHeFrLJYSaZfDPzZQahI5uQXDijMegeHJSYaZxmIsNSpIayzDkj_EPh1n182-F3L-IlBNwGgVZZ28TKUNvS-Bj5KZUEsL0Eo3vW62PfNcpnV0MLjXm0CCY4Bedr8eLqvv0wjrJQD5mrVAIFLIF9uKNObVbv50hSkTA45oQmmPgCC92yeO3DtTF1US13K9aFsxU7dQqN_kmgVvfS9RpaCYAyOh_tgHU054oNtFo3a4jC2d1Epy3Pqf2JdFbY7zc77EYlWIuP8AwzKvPosdd6juDzpn3T7gy2UOf66unyFoduCdjwmA4xSSW1xuUAGVzKMg2-Xgb-gG1SnYCTRbSjliaCpoawnHOZWCHBNc44Z1Y7Jtk2WijKwu2gSDIj8oTFOveeczORLhbaMgA_1Bf8yxtobywm1RtVxVBxzFPPGWUN1KwFp0woNO77XbyrSYlkL3cFclde7oo00PF4Sr3ejJ_JlDZU7ks9-Mbhs-bs1xpT4ZAO1GRLNdBJrcXJ8L-L7c5ebA8tAaoSIz7ZPloYfn65A0AuQ30YtucvP8vkPw priority: 102 providerName: ProQuest
Title	Memory bandwidth optimization of SpMV on GPGPUs
URI	http://lib.cqvip.com/qk/71018X/201503/664841053.html https://journal.hep.com.cn/fcs/EN/10.1007/s11704-014-4127-1 https://link.springer.com/article/10.1007/s11704-014-4127-1 https://www.proquest.com/docview/2918719341
Volume	9
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3dT9swED-x8jJpGmMfooOhPPC0yaz-SuJHmGjRUFG10Yk9WfFHKIKlbA2a4K_nnMZURRsST4kU25Hv_PH7-c53ADsuo6X01hDUPRKUlBuSF7IgSuaW8cJ7K8NF4eFxejgWX0_laXuPexa93aNJslmpF5fdaNZ4TAgiKMsIUp5VhB890YHVvcHPo8XRCgs4ubEfMAQQJJxxRHvmv9oJURUm0-rsN_5zaXd6MWlcLbxrXFKXIOgDq2mzGfXX4CR2Y-6DcrF7XZtde_sgwuMT-_kKXrbgNNmbj6Z1WPHVa1iLiR-Sdh14A5-HwT_3JjFF5f6eu3qSTHHl-dVe6UymZfL9avgjwdfBaDAaz97CuH9w8uWQtJkXiBUpqwnNFXPWlwg_fM4Lg7yxQG7hesxkSNio8cyxTLLcUl4KoTInFdLsQgjujOeKv4NONa38BiSKW1lmPDVlYOG9TPlUGscRSLEQPLDswua99PXVPMKGTlORB_9T3oVe1Ie2bdDykDvjUi_CLQdxaRSXDuLStAsf76vE9h4pTJeUrMsQNiIkIX-szlYcCLqd8DPNFEXqqRATdOFT1Ovi838be_-k0pvwHAGbnLuqbUGn_nPtPyAoqs02PMv7g-12KuBz_-B49O0O_n_-gA
linkProvider	Springer Nature
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9NAEF5V5QASojzVtAX2ABfQqtmX7T1UCAFJSpuqEg3qbfE-TJConTZBVf9UfyMzjp0oSOTWm6X17mFmduabnRchb0LKCx29Y8B7cFAS6ViW65wZnXkh8xi9xkLh4UkyGKmv5_p8g9y2tTCYVtnqxFpRh8rjG_m-MBywvQGl-2FyyXBqFEZX2xEac7E4ijfX4LJNDw4_A3_fCtH7cvZpwJqpAsyrRMwYz4wIPhZgWmMmcwc-UQ64OXSFS8EZ4S6KIFItMs9loZRJgzbgQuZKyeCixOZLoPLvKSkN3qis11-86QgE6HXgQgByYfi40gZS62o9ntYpH4opLlLGsZ3DuCp_XoKRWjGLD8d1jkcMdS7sCvb9J1xbW8HeY_Koga_041zenpCNWD4lW-1oCNpoimdkf4gZvDfU5WW4_hVmY1qBbrpoij5pVdBvk-F3Cp_90_7paPqcjO6Eii_IZlmVcZtQI70uUpm4Av30bmpiol2QALUEthcsOmR3QSY7mffgsEmiMsxQlR3SbQlnfdPWHKdr_LbLhsxIdwt0t0h3yzvk3WJLe96an_kKN2yBjSVwTPm6PXstx2yjEqZ2KcAd8r7l4nL5v4ftrD_sNbk_OBse2-PDk6Nd8gDwnJ5nsu2RzdnVn_gSMNPMvaoFlZIfd30z_gKIXR4I
linkToPdf	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1bb9MwFD5CnYSQEOOqlQ3IA08gr_UtiR-nQTsYnSpB0Xgy8SUUAWmhmRD8-h0n8apOMAnxFim2I_s49vf5HH8H4KnLaCm9NQRtjwQl5YbkhSyIkrllvPDeynBReHKSHs3E61N52uU5XcVo9-iSbO80BJWmqh4sXTlYX3yjWRM9IYigLCNIf7ZEkLbrwdbB-MPx-piFBczc-BIYggkSzjuib_NP7QSFhfmi-vQdv7-xU92cN2EX3jXhqRtw9JIHtdmYRtvwMXapjUf5sn9Wm337-5La43_0-Tbc6kBrctDOsjtwzVd3YTsmhEi69eEeDCYhbvdXYorK_fzs6nmywBXpW3fVM1mUydvl5H2Cj-PpeDpb3YfZ6OW7wyPSZWQgVqSsJjRXzFlfIizxOS8M8skCOYcbMpMhkaPGM8cyyXJLeSmEypxUSL8LIbgzniv-AHrVovI7kChuZZnx1JSBnQ8z5VNpHEeAxYKoYNmH3QtL6GWrvKHTVOQhLpX3YRhto20nZh5yanzVaxnmMFwah0uH4dK0D88uqsT2rihMNwyuyyAnEZKTX1VnL04K3S0EK80URUqqECv04Xm08fr1Xxt7-E-ln8D16YuRfvPq5HgXbiCmk2002x706h9n_hHipto87v6Ncz8rB-I
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Memory+bandwidth+optimization+of+SpMV+on+GPGPUs&rft.jtitle=Frontiers+of+Computer+Science&rft.au=Yan%2C+Chenggang+Clarence&rft.au=Yu%2C+Hui&rft.au=Xu%2C+Weizhi&rft.au=Zhang%2C+Yingping&rft.date=2015-06-01&rft.pub=Higher+Education+Press&rft.issn=2095-2228&rft.eissn=2095-2236&rft.volume=9&rft.issue=3&rft.spage=431&rft.epage=441&rft_id=info:doi/10.1007%2Fs11704-014-4127-1&rft.externalDocID=10_1007_s11704_014_4127_1
thumbnail_s	http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fimage.cqvip.com%2Fvip1000%2Fqk%2F71018X%2F71018X.jpg