Memory bandwidth optimization of SpMV on GPGPUs

It is an important task to improve performance for sparse matrix vector multiplication (SpMV), and it is a difficult task because of its irregular memory access. Gen- eral purpose GPU (GPGPU) provides high computing abil- ity and substantial bandwidth that cannot be fully exploited by SpMV due to it...

Full description

Saved in:
Bibliographic Details
Published inFrontiers of Computer Science Vol. 9; no. 3; pp. 431 - 441
Main Authors YAN, Chenggang Clarence, YU, Hui, XU, Weizhi, ZHANG, Yingping, CHEN, Bochuan, TIAN, Zhu, WANG, Yuxuan, YIN, Jian
Format Journal Article
LanguageEnglish
Published Beijing Higher Education Press 01.06.2015
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
Abstract It is an important task to improve performance for sparse matrix vector multiplication (SpMV), and it is a difficult task because of its irregular memory access. Gen- eral purpose GPU (GPGPU) provides high computing abil- ity and substantial bandwidth that cannot be fully exploited by SpMV due to its irregularity. In this paper, we propose two novel methods to optimize the memory bandwidth for SpMV on GPGPU. First, a new storage format is proposed to exploit memory bandwidth of GPU architecture more effi- ciently. The new storage format can ensure that there are as many non-zeros as possible in the format which is suitable to exploit the memory bandwidth of the GPU. Second, we pro- pose a cache blocking method to improve the performance of SpMV on GPU architecture. The sparse matrix is partitioned into sub-blocks that are stored in CSR format. With the block- ing method, the corresponding part of vector x can be reused in the GPU cache, so the time to access the global memory for vector x is reduced heavily. Experiments are carried out on three GPU platforms, GeForce 9800 GX2, GeForce GTX 480, and Tesla K40. Experimental results show that both new methods can efficiently improve the utilization of GPU mem- ory bandwidth and the performance of the GPU.
AbstractList It is an important task to improve performance for sparse matrix vector multiplication (SpMV), and it is a difficult task because of its irregular memory access. General purpose GPU (GPGPU) provides high computing ability and substantial bandwidth that cannot be fully exploited by SpMV due to its irregularity. In this paper, we propose two novel methods to optimize the memory bandwidth for SpMV on GPGPU. First, a new storage format is proposed to exploit memory bandwidth of GPU architecture more efficiently. The new storage format can ensure that there are as many non-zeros as possible in the format which is suitable to exploit the memory bandwidth of the GPU. Second, we propose a cache blocking method to improve the performance of SpMV on GPU architecture. The sparse matrix is partitioned into sub-blocks that are stored in CSR format.With the blocking method, the corresponding part of vector x can be reused in the GPU cache, so the time to access the global memory for vector x is reduced heavily. Experiments are carried out on three GPU platforms, GeForce 9800 GX2, GeForce GTX 480, and Tesla K40. Experimental results show that both new methods can efficiently improve the utilization of GPU memory bandwidth and the performance of the GPU.
It is an important task to improve performance for sparse matrix vector multiplication (SpMV), and it is a difficult task because of its irregular memory access. Gen- eral purpose GPU (GPGPU) provides high computing abil- ity and substantial bandwidth that cannot be fully exploited by SpMV due to its irregularity. In this paper, we propose two novel methods to optimize the memory bandwidth for SpMV on GPGPU. First, a new storage format is proposed to exploit memory bandwidth of GPU architecture more effi- ciently. The new storage format can ensure that there are as many non-zeros as possible in the format which is suitable to exploit the memory bandwidth of the GPU. Second, we pro- pose a cache blocking method to improve the performance of SpMV on GPU architecture. The sparse matrix is partitioned into sub-blocks that are stored in CSR format. With the block- ing method, the corresponding part of vector x can be reused in the GPU cache, so the time to access the global memory for vector x is reduced heavily. Experiments are carried out on three GPU platforms, GeForce 9800 GX2, GeForce GTX 480, and Tesla K40. Experimental results show that both new methods can efficiently improve the utilization of GPU mem- ory bandwidth and the performance of the GPU.
It is an important task to improve performance for sparse matrix vector multiplication (SpMV), and it is a difficult task because of its irregular memory access. General purpose GPU (GPGPU) provides high computing ability and substantial bandwidth that cannot be fully exploited by SpMV due to its irregularity. In this paper, we propose two novel methods to optimize the memory bandwidth for SpMV on GPGPU. First, a new storage format is proposed to exploit memory bandwidth of GPU architecture more efficiently. The new storage format can ensure that there are as many non-zeros as possible in the format which is suitable to exploit the memory bandwidth of the GPU. Second, we propose a cache blocking method to improve the performance of SpMV on GPU architecture. The sparse matrix is partitioned into sub-blocks that are stored in CSR format.With the blocking method, the corresponding part of vector x can be reused in the GPU cache, so the time to access the global memory for vector x is reduced heavily. Experiments are carried out on three GPU platforms, GeForce 9800 GX2, GeForce GTX 480, and Tesla K40. Experimental results show that both new methods can efficiently improve the utilization of GPU memory bandwidth and the performance of the GPU.
Author Chenggang Clarence YAN Hui YU Weizhi XU Yingping ZHANG Bochuan CHEN Zhu TIAN Yuxuan WANG Jian YIN
AuthorAffiliation Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China Institute of Microelectronics, Tsinghua University, Beijing 100084, China Automation Department, Tsinghua University, Beijing 100084, China State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China State Grid Information & Communication Company of Hunan EPC, Changsha 410007, China Department of Computer, Shandong University, Weihai 250101, China
Author_xml – sequence: 1
  givenname: Chenggang Clarence
  surname: YAN
  fullname: YAN, Chenggang Clarence
  organization: Automation Department, Tsinghua University, Beijing 100084, China
– sequence: 2
  givenname: Hui
  surname: YU
  fullname: YU, Hui
  organization: Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
– sequence: 3
  givenname: Weizhi
  surname: XU
  fullname: XU, Weizhi
  email: weizhixu@gmail.com
  organization: State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
– sequence: 4
  givenname: Yingping
  surname: ZHANG
  fullname: ZHANG, Yingping
  organization: State Grid Information & Communication Company of Hunan EPC, Changsha 410007, China
– sequence: 5
  givenname: Bochuan
  surname: CHEN
  fullname: CHEN, Bochuan
  organization: State Grid Information & Communication Company of Hunan EPC, Changsha 410007, China
– sequence: 6
  givenname: Zhu
  surname: TIAN
  fullname: TIAN, Zhu
  organization: Department of Computer, Shandong University,Weihai 250101, China
– sequence: 7
  givenname: Yuxuan
  surname: WANG
  fullname: WANG, Yuxuan
  organization: Department of Computer, Shandong University,Weihai 250101, China
– sequence: 8
  givenname: Jian
  surname: YIN
  fullname: YIN, Jian
  organization: Department of Computer, Shandong University,Weihai 250101, China
BookMark eNp9kEtPwzAQhC0EEqXwA7hFcA7dXTtxfEQIClIrkHhcrTyc1qiNgx2Eyq8nJQgkDj3tHubbmZ0jtt-4xjB2inCBAHISECWIGFDEAknGuMdGBCqJiXi6_7tTdshOQngFAAJKEqIRm8zN2vlNVORN9WGrbhm5trNr-5l31jWRq6PHdv4S9ev0YfrwHI7ZQZ2vgjn5mWP2fHP9dHUbz-6nd1eXs7gUKXUxZoqq0tQSwWQ8LxDSnENaARVSkcDCUEUyoaxEXguhZJWoVEEuBK8KwxUfs_Phbuvd27sJnX51777pLTUpzCQqLrBX4aAqvQvBm1q33q5zv9EIeluNHqrRfTV6W43eMvIfU9ru-9vO53a1k6SBDL1LszD-L9MuKBugpV0sjTdV600Iuva9nzV-N3r2k3TpmsVbb_n7XpqKTCAknH8BpPWT2g
CitedBy_id crossref_primary_10_1002_cpe_8366
crossref_primary_10_1080_09728600_2022_2148589
crossref_primary_10_1007_s11227_024_05949_6
crossref_primary_10_1007_s11227_015_1571_0
Cites_doi 10.1109/IPDPS.2011.73
10.1109/LSP.2014.2310494
10.1145/2038037.1941587
10.1016/j.parco.2008.12.006
10.1109/ICCIS.2010.285
10.1109/TCSVT.2014.2380232
10.1109/TMM.2012.2190391
10.1145/1183401.1183444
10.1145/1837853.1693471
10.1007/978-3-642-11515-8_10
10.14778/1938545.1938548
10.1145/882262.882364
10.1049/el.2014.0611
ContentType Journal Article
Copyright Copyright reserved, 2014, Higher Education Press and Springer-Verlag Berlin Heidelberg
Higher Education Press and Springer-Verlag Berlin Heidelberg 2015
Higher Education Press and Springer-Verlag Berlin Heidelberg 2015.
Copyright_xml – notice: Copyright reserved, 2014, Higher Education Press and Springer-Verlag Berlin Heidelberg
– notice: Higher Education Press and Springer-Verlag Berlin Heidelberg 2015
– notice: Higher Education Press and Springer-Verlag Berlin Heidelberg 2015.
DBID 2RA
92L
CQIGP
W92
~WA
AAYXX
CITATION
8FE
8FG
AFKRA
ARAPS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
GNUQQ
HCIFZ
JQ2
K7-
P5Z
P62
PHGZM
PHGZT
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
DOI 10.1007/s11704-014-4127-1
DatabaseName 维普期刊资源整合服务平台
中文科技期刊数据库-CALIS站点
中文科技期刊数据库-7.0平台
中文科技期刊数据库-工程技术
中文科技期刊数据库- 镜像站点
CrossRef
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central UK/Ireland
Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Central
Technology Collection
ProQuest One
ProQuest Central Korea
ProQuest Central Student
SciTech Premium Collection
ProQuest Computer Science Collection
Computer Science Database
Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Premium
ProQuest One Academic (New)
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
DatabaseTitle CrossRef
Advanced Technologies & Aerospace Collection
Computer Science Database
ProQuest Central Student
Technology Collection
ProQuest One Academic Middle East (New)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
ProQuest One Academic Eastern Edition
SciTech Premium Collection
ProQuest One Community College
ProQuest Technology Collection
ProQuest SciTech Collection
ProQuest Central
Advanced Technologies & Aerospace Database
ProQuest One Applied & Life Sciences
ProQuest One Academic UKI Edition
ProQuest Central Korea
ProQuest Central (New)
ProQuest One Academic
ProQuest One Academic (New)
DatabaseTitleList

Advanced Technologies & Aerospace Collection

Database_xml – sequence: 1
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
DocumentTitleAlternate Memory bandwidth optimization of SpMV on GPGPUs
EISSN 2095-2236
EndPage 441
ExternalDocumentID 10_1007_s11704_014_4127_1
10.1007/s11704-014-4127-1
664841053
GroupedDBID -EM
.VR
06D
0VY
1-T
2J2
2JN
2JY
2KG
2KM
2LR
2RA
30V
4.4
406
408
40E
5VS
92L
95-
95.
96X
AABHQ
AAFGU
AAIAL
AAJKR
AANZL
AARHV
AARTL
AATLR
AATNV
AATVU
AAUYE
AAWCG
AAYFA
AAYIU
AAYQN
AAYTO
ABDZT
ABECU
ABFGW
ABFTD
ABFTV
ABHQN
ABJNI
ABJOX
ABKAS
ABKCH
ABMQK
ABNWP
ABQBU
ABSXP
ABTEG
ABTHY
ABTKH
ABTMW
ABWNU
ABXPI
ACAOD
ACBMV
ACBRV
ACBXY
ACGFS
ACHSB
ACHXU
ACIPQ
ACKNC
ACMDZ
ACMLO
ACOKC
ACSNA
ACTTH
ACVWB
ACWMK
ACZOJ
ADHIR
ADINQ
ADKNI
ADKPE
ADMDM
ADOXG
ADRFC
ADTPH
ADURQ
ADYFF
ADZKW
AEBTG
AEFTE
AEGNC
AEJHL
AEJRE
AEKMD
AENEX
AEOHA
AEPYU
AESKC
AESTI
AETLH
AEVLU
AEVTX
AEXYK
AFKRA
AFLOW
AFNRJ
AFQWF
AFWTZ
AFZKB
AGAYW
AGDGC
AGGBP
AGJBK
AGMZJ
AGQMX
AGWIL
AGWZB
AGYKE
AHBYD
AHKAY
AHSBF
AHYZX
AIAKS
AIIXL
AILAN
AIMYW
AITGF
AJBLW
AJDOV
AJRNO
AJZVZ
ALMA_UNASSIGNED_HOLDINGS
ALWAN
AMKLP
AMXSW
AMYLF
AOCGG
ARAPS
ARMRJ
AXYYD
B-.
BDATZ
BENPR
BGLVJ
BGNMA
CQIGP
CSCUP
DDRTE
DNIVK
DPUIP
EBLON
EBS
EIOEI
EJD
ESBYG
FERAY
FFXSO
FIGPU
FINBP
FNLPD
FRRFC
FSGXE
FWDCC
GGCAI
GGRSB
GJIRD
GNWQR
GQ6
GQ7
HCIFZ
HF~
HG6
HMJXF
HRMNR
HZ~
IKXTQ
IWAJR
IXD
I~Z
J-C
JBSCW
JZLTJ
K7-
KOV
LLZTM
M4Y
MA-
NPVJJ
NQJWS
NU0
O9J
P4S
PF0
PT4
R89
ROL
RSV
S16
S3B
SAP
SCL
SCO
SHX
SISQX
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
SZN
TSG
TUC
UG4
UNUBA
UOJIU
UTJUX
UZXMN
VFIZW
W48
W92
YLTOR
Z7R
Z7X
Z81
Z83
Z88
ZMTXR
~WA
AAEIZ
AAPBV
ADTIX
C
EM
H13
HF
HZ
RIG
VR
0R~
AACDK
AAJBT
AASML
AAYZH
ABAKF
ACDTI
ACPIV
AEFQL
AEMSY
AFBBN
AGQEE
AGRTI
AIGIU
BSONS
CCPQU
SJYHP
-SI
-S~
AAPKM
AAXDM
AAYXX
ABBRH
ABDBE
ABFSG
ACSTC
AEZWR
AFDZB
AFHIU
AFOHR
AHPBZ
AHWEU
AIXLP
ATHPR
AYFIA
CAJEI
CITATION
PHGZM
PHGZT
Q--
U1G
U5S
8FE
8FG
ABRTQ
AZQEC
DWQXO
GNUQQ
JQ2
P62
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PUEGO
ID FETCH-LOGICAL-c462t-1892dcef710e83ab106a306d02b79241be2d27528c13f4497d59690a443dbe393
IEDL.DBID AGYKE
ISSN 2095-2228
IngestDate Sat Aug 23 14:16:14 EDT 2025
Tue Jul 01 02:22:06 EDT 2025
Thu Apr 24 22:58:47 EDT 2025
Fri Feb 21 02:33:37 EST 2025
Thu Aug 18 16:19:20 EDT 2022
Wed Feb 14 10:31:24 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 3
Keywords memory bandwidth
SpMV
cache blocking
performance tuning
GPGPU
Language English
License This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c462t-1892dcef710e83ab106a306d02b79241be2d27528c13f4497d59690a443dbe393
Notes 11-5731/TP
It is an important task to improve performance for sparse matrix vector multiplication (SpMV), and it is a difficult task because of its irregular memory access. Gen- eral purpose GPU (GPGPU) provides high computing abil- ity and substantial bandwidth that cannot be fully exploited by SpMV due to its irregularity. In this paper, we propose two novel methods to optimize the memory bandwidth for SpMV on GPGPU. First, a new storage format is proposed to exploit memory bandwidth of GPU architecture more effi- ciently. The new storage format can ensure that there are as many non-zeros as possible in the format which is suitable to exploit the memory bandwidth of the GPU. Second, we pro- pose a cache blocking method to improve the performance of SpMV on GPU architecture. The sparse matrix is partitioned into sub-blocks that are stored in CSR format. With the block- ing method, the corresponding part of vector x can be reused in the GPU cache, so the time to access the global memory for vector x is reduced heavily. Experiments are carried out on three GPU platforms, GeForce 9800 GX2, GeForce GTX 480, and Tesla K40. Experimental results show that both new methods can efficiently improve the utilization of GPU mem- ory bandwidth and the performance of the GPU.
GPGPU, performance tuning, SpMV, cacheblocking, memory bandwidth
memory bandwidth
SpMV
Document received on :2014-05-15
cache blocking
performance tuning
Document accepted on :2014-06-10
GPGPU
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
OpenAccessLink https://journal.hep.com.cn/fcs/EN/10.1007/s11704-014-4127-1
PQID 2918719341
PQPubID 2044369
PageCount 11
ParticipantIDs proquest_journals_2918719341
crossref_primary_10_1007_s11704_014_4127_1
crossref_citationtrail_10_1007_s11704_014_4127_1
springer_journals_10_1007_s11704_014_4127_1
higheredpress_frontiers_10_1007_s11704_014_4127_1
chongqing_primary_664841053
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2015-06-01
PublicationDateYYYYMMDD 2015-06-01
PublicationDate_xml – month: 06
  year: 2015
  text: 2015-06-01
  day: 01
PublicationDecade 2010
PublicationPlace Beijing
PublicationPlace_xml – name: Beijing
– name: Heidelberg
PublicationSubtitle Selected Publications from Chinese Universities
PublicationTitle Frontiers of Computer Science
PublicationTitleAbbrev Front. Comput. Sci
PublicationTitleAlternate Frontiers of Computer Science in China
PublicationYear 2015
Publisher Higher Education Press
Springer Nature B.V
Publisher_xml – name: Higher Education Press
– name: Springer Nature B.V
References Yang, Parthasarathy, Sadayappan (CR23) 2011; 4
Zhang, Yan, Dai, Ma (CR6) 2012; 14
Guo, Wang (CR22) 2010
Yan, Zhang, Dai, Li (CR5) 2013
Yan, Zhang, Xu, Dai, Zhang, Dai, Wu (CR4) 2014; 99
Buluc, Williams, Oliker, Demmel (CR25) 2011
Kourtis, Karakasis, Goumas, Koziris (CR27) 2011; 46
Baskaran, Bordawekar (CR17) 2008
Nvidia (CR9) 2007
Bolz, Farmer, Grinspun, Schröoder (CR14) 2003; 22
Choi, Singh, Vuduc (CR21) 2010; 45
Buluç, Fineman, Frigo, Gilbert, Leiserson (CR26) 2009
Im (CR10) 2000
Cevahir, Nukada, Matsuoka (CR18) 2009
CR3
Yan, Zhang, Xu, Dai, Li, Dai, Wu (CR2) 2014; 21
Yan, Dai, Zhang, Ma, Chen, Fan, Zheng (CR7) 2011
Vuduc (CR11) 2003
Williams, Oliker, Vuduc, Shalf, Yelick, Demmel (CR13) 2009; 35
Bell, Garland (CR8) 2009
Sengupta, Harris, Zhang, Owens (CR15) 2007
Willcock, Lumsdaine (CR28) 2006
Xu, Liu, Wu, Ye, Jiao, Wang, Song, Fan (CR1) 2012
Williams (CR12) 2008
Bell, Garland (CR16) 2008
Xu, Liu, Fan, Jiao, Ye, Song, Yan (CR29) 2012; 6
Vázquez, Garzón, Martnez, Fernández (CR19) 2009
Monakov, Lokhmotov, Avetisyan (CR20) 2010
Xu, Zhang, Jiao, Wang, Song, Liu (CR24) 2012
S W Williams (4127_CR12) 2008
S Sengupta (4127_CR15) 2007
C G Yan (4127_CR2) 2014; 21
M M Baskaran (4127_CR17) 2008
W Xu (4127_CR1) 2012
S Williams (4127_CR13) 2009; 35
E Im (4127_CR10) 2000
J Bolz (4127_CR14) 2003; 22
C G Yan (4127_CR7) 2011
J W Choi (4127_CR21) 2010; 45
WZ Xu (4127_CR29) 2012; 6
C G Yan (4127_CR5) 2013
R W Vuduc (4127_CR11) 2003
A Buluc (4127_CR25) 2011
N Bell (4127_CR16) 2008
W Xu (4127_CR24) 2012
Y D Zhang (4127_CR6) 2012; 14
C Yan (4127_CR4) 2014; 99
J Willcock (4127_CR28) 2006
N Bell (4127_CR8) 2009
4127_CR3
A Buluç (4127_CR26) 2009
X Yang (4127_CR23) 2011; 4
P Guo (4127_CR22) 2010
F Vázquez (4127_CR19) 2009
K Kourtis (4127_CR27) 2011; 46
A Cevahir (4127_CR18) 2009
A Monakov (4127_CR20) 2010
C Nvidia (4127_CR9) 2007
References_xml – start-page: 1081
  year: 2009
  end-page: 1092
  ident: CR19
  article-title: The sparse matrix vector product on GPUs
  publication-title: Proceedings of the 2009 International Conference on Computational and Mathematical Methods in Science and Engineering
– year: 2008
  ident: CR17
  article-title: Optimizing Sparse Matrix-vector Multiplication on GPUs Using Compile-time and Run-time Strategies
  publication-title: IBM Reserach Report RC24704 (W0812-047)
– start-page: 63
  year: 2013
  end-page: 72
  ident: CR5
  article-title: Highly parallel framework for HEVC motion estimation on many-core platform
  publication-title: Proceedings of Data Compression Conference
– year: 2008
  ident: CR16
  article-title: Efficient Sparse Matrix-vector Multiplication on Cuda
  publication-title: Technical Report, NVIDIA Technical Report NVR-2008-004
– start-page: 231
  year: 2012
  end-page: 235
  ident: CR24
  article-title: Optimizing sparse matrix vector multiplication using cache blocking method on fermi GPU
  publication-title: Proceedings of the 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing (SNPD)
– year: 2008
  ident: CR12
  article-title: Auto-tuning performance on multicore computers
  publication-title: Dissertation for the Doctoral Degree
– start-page: 721
  year: 2011
  end-page: 733
  ident: CR25
  article-title: Reduced-bandwidth multithreaded algorithms for sparse matrix-vector multiplication
  publication-title: Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
  doi: 10.1109/IPDPS.2011.73
– volume: 21
  start-page: 573
  issue: 5
  year: 2014
  end-page: 576
  ident: CR2
  article-title: A highly parallel framework for HEVC coding unit partitioning tree decision on many-core processors
  publication-title: IEEE Signal Processing letters
  doi: 10.1109/LSP.2014.2310494
– volume: 46
  start-page: 247
  issue: 8
  year: 2011
  end-page: 256
  ident: CR27
  article-title: Csx: an extended compression format for spmv on shared memory systems
  publication-title: ACM SIGPLAN Notices
  doi: 10.1145/2038037.1941587
– volume: 35
  start-page: 178
  issue: 3
  year: 2009
  end-page: 194
  ident: CR13
  article-title: Optimization of sparse matrix-vector multiplication on emerging multicore platforms
  publication-title: Parallel Computing
  doi: 10.1016/j.parco.2008.12.006
– year: 2003
  ident: CR11
  article-title: Automatic performance tuning of sparse matrix kernels
  publication-title: Dissertation for the Doctoral Degree
– volume: 6
  start-page: 71
  issue: 1
  year: 2012
  end-page: 78
  ident: CR29
  article-title: Accelerating sparse matrix vector multiplication on many-core GPUs
  publication-title: World Academy of Science, Engineering and Technology
– start-page: 1154
  year: 2010
  end-page: 1157
  ident: CR22
  article-title: Auto-tuning cuda parameters for sparse matrixvector multiplication on GPUs
  publication-title: Proceedings of the 2010 International Conference on Computational and Information Sciences (ICCIS)
  doi: 10.1109/ICCIS.2010.285
– volume: 99
  start-page: 1
  year: 2014
  ident: CR4
  article-title: Efficient parpallel framework for HEVC motion estimation on many-core processors
  publication-title: IEEE Transactions on Circuits and Systems for Video Technology
  doi: 10.1109/TCSVT.2014.2380232
– volume: 14
  start-page: 510
  issue: 3
  year: 2012
  end-page: 524
  ident: CR6
  article-title: Efficient parallel framework for H.264/AVC deblocking filter on many-core platform
  publication-title: IEEE Transactions on Multimedia
  doi: 10.1109/TMM.2012.2190391
– year: 2000
  ident: CR10
  article-title: Optimizing the performance of sparse matrix-vector multiplication
  publication-title: Dissertation for the Doctoral Degree
– start-page: 1
  year: 2011
  end-page: 68
  ident: CR7
  article-title: Parallel deblocking filter for H.264/AVC implemented on Tile64 platform
  publication-title: Proceedings of the International Conference on Multimedia and Expo
– ident: CR3
– start-page: 307
  year: 2006
  end-page: 316
  ident: CR28
  article-title: Accelerating sparse matrix computations via data compression
  publication-title: Proceedings of the 20th annual international conference on Supercomputing
  doi: 10.1145/1183401.1183444
– start-page: 233
  year: 2009
  end-page: 244
  ident: CR26
  article-title: Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks
  publication-title: Proceedings of the 21st annual symposium on Parallelism in algorithms and architectures
– year: 2007
  ident: CR9
  publication-title: Compute Unified Device Architecture Programming Guide
– start-page: 18
  year: 2009
  ident: CR8
  article-title: Implementing sparse matrix-vector multiplication on throughput-oriented processors
  publication-title: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
– start-page: 30
  year: 2012
  end-page: 36
  ident: CR1
  article-title: Auto-tuning GEMV on many-core GPU
  publication-title: Proceedings of the 18th IEEE International Conference on Parallel and Distributed Systems
– start-page: 97
  year: 2007
  end-page: 106
  ident: CR15
  article-title: Scan primitives for GPU computing
  publication-title: Proceedings of Graphics Hardware
– volume: 45
  start-page: 115
  issue: 5
  year: 2010
  end-page: 126
  ident: CR21
  article-title: Model-driven autotuning of sparse matrix-vector multiply on GPUs
  publication-title: ACM Sigplan Notices
  doi: 10.1145/1837853.1693471
– start-page: 893
  year: 2009
  end-page: 903
  ident: CR18
  article-title: Fast conjugate gradients with multiple GPUs
  publication-title: Proceedings of the Computational Science
– start-page: 111
  year: 2010
  end-page: 125
  ident: CR20
  article-title: Automatically tuning sparse matrix-vector multiplication for GPU architectures
  publication-title: Proceedings of the High Performance Embedded Architectures and Compilers
  doi: 10.1007/978-3-642-11515-8_10
– volume: 4
  start-page: 231
  issue: 4
  year: 2011
  end-page: 242
  ident: CR23
  article-title: Fast sparse matrix-vector multiplication on GPUs: implications for graph mining
  publication-title: Proceedings of the VLDB Endowment
  doi: 10.14778/1938545.1938548
– volume: 22
  start-page: 917
  issue: 3
  year: 2003
  end-page: 924
  ident: CR14
  article-title: Sparse matrix solvers on the GPU: conjugate gradients and multigrid
  publication-title: ACM Transactions on Graphics
  doi: 10.1145/882262.882364
– start-page: 18
  volume-title: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
  year: 2009
  ident: 4127_CR8
– volume-title: Dissertation for the Doctoral Degree
  year: 2008
  ident: 4127_CR12
– start-page: 63
  volume-title: Proceedings of Data Compression Conference
  year: 2013
  ident: 4127_CR5
– start-page: 893
  volume-title: Proceedings of the Computational Science
  year: 2009
  ident: 4127_CR18
– start-page: 30
  volume-title: Proceedings of the 18th IEEE International Conference on Parallel and Distributed Systems
  year: 2012
  ident: 4127_CR1
– volume-title: Dissertation for the Doctoral Degree
  year: 2000
  ident: 4127_CR10
– volume: 35
  start-page: 178
  issue: 3
  year: 2009
  ident: 4127_CR13
  publication-title: Parallel Computing
  doi: 10.1016/j.parco.2008.12.006
– volume: 99
  start-page: 1
  year: 2014
  ident: 4127_CR4
  publication-title: IEEE Transactions on Circuits and Systems for Video Technology
– volume: 22
  start-page: 917
  issue: 3
  year: 2003
  ident: 4127_CR14
  publication-title: ACM Transactions on Graphics
  doi: 10.1145/882262.882364
– start-page: 1
  volume-title: Proceedings of the International Conference on Multimedia and Expo
  year: 2011
  ident: 4127_CR7
– volume: 6
  start-page: 71
  issue: 1
  year: 2012
  ident: 4127_CR29
  publication-title: World Academy of Science, Engineering and Technology
– start-page: 97
  volume-title: Proceedings of Graphics Hardware
  year: 2007
  ident: 4127_CR15
– volume: 45
  start-page: 115
  issue: 5
  year: 2010
  ident: 4127_CR21
  publication-title: ACM Sigplan Notices
  doi: 10.1145/1837853.1693471
– volume: 46
  start-page: 247
  issue: 8
  year: 2011
  ident: 4127_CR27
  publication-title: ACM SIGPLAN Notices
  doi: 10.1145/2038037.1941587
– volume-title: IBM Reserach Report RC24704 (W0812-047)
  year: 2008
  ident: 4127_CR17
– volume: 21
  start-page: 573
  issue: 5
  year: 2014
  ident: 4127_CR2
  publication-title: IEEE Signal Processing letters
  doi: 10.1109/LSP.2014.2310494
– volume-title: Dissertation for the Doctoral Degree
  year: 2003
  ident: 4127_CR11
– volume: 4
  start-page: 231
  issue: 4
  year: 2011
  ident: 4127_CR23
  publication-title: Proceedings of the VLDB Endowment
  doi: 10.14778/1938545.1938548
– ident: 4127_CR3
  doi: 10.1049/el.2014.0611
– volume: 14
  start-page: 510
  issue: 3
  year: 2012
  ident: 4127_CR6
  publication-title: IEEE Transactions on Multimedia
  doi: 10.1109/TMM.2012.2190391
– start-page: 233
  volume-title: Proceedings of the 21st annual symposium on Parallelism in algorithms and architectures
  year: 2009
  ident: 4127_CR26
– volume-title: Compute Unified Device Architecture Programming Guide
  year: 2007
  ident: 4127_CR9
– start-page: 1154
  volume-title: Proceedings of the 2010 International Conference on Computational and Information Sciences (ICCIS)
  year: 2010
  ident: 4127_CR22
  doi: 10.1109/ICCIS.2010.285
– start-page: 231
  volume-title: Proceedings of the 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing (SNPD)
  year: 2012
  ident: 4127_CR24
– volume-title: Technical Report, NVIDIA Technical Report NVR-2008-004
  year: 2008
  ident: 4127_CR16
– start-page: 721
  volume-title: Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
  year: 2011
  ident: 4127_CR25
  doi: 10.1109/IPDPS.2011.73
– start-page: 111
  volume-title: Proceedings of the High Performance Embedded Architectures and Compilers
  year: 2010
  ident: 4127_CR20
  doi: 10.1007/978-3-642-11515-8_10
– start-page: 307
  volume-title: Proceedings of the 20th annual international conference on Supercomputing
  year: 2006
  ident: 4127_CR28
  doi: 10.1145/1183401.1183444
– start-page: 1081
  volume-title: Proceedings of the 2009 International Conference on Computational and Mathematical Methods in Science and Engineering
  year: 2009
  ident: 4127_CR19
SSID ssj0002025522
Score 2.0113926
Snippet It is an important task to improve performance for sparse matrix vector multiplication (SpMV), and it is a difficult task because of its irregular memory...
It is an important task to improve performance for sparse matrix vector multiplication (SpMV), and it is a difficult task because of its irregular memory...
SourceID proquest
crossref
springer
higheredpress
chongqing
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 431
SubjectTerms Bandwidths
cache blocking
Computer architecture
Computer Science
Format
GeForce
GPGPU
GPU
Graphics processing units
Mathematical analysis
memory bandwidth
Performance enhancement
performance tuning
Research Article
Sparse matrices
Sparsity
SpMV
内存带宽
存储器
存储格式
带宽优化
稀疏矩阵
高速缓存
SummonAdditionalLinks – databaseName: ProQuest Central
  dbid: BENPR
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LSwMxEA4-LoL4FuuLPXhSgs1rd3MSFR8IlaJWvIXNY62gu62tiP_eyTbbUsHeFrLJYSaZfDPzZQahI5uQXDijMegeHJSYaZxmIsNSpIayzDkj_EPh1n182-F3L-IlBNwGgVZZ28TKUNvS-Bj5KZUEsL0Eo3vW62PfNcpnV0MLjXm0CCY4Bedr8eLqvv0wjrJQD5mrVAIFLIF9uKNObVbv50hSkTA45oQmmPgCC92yeO3DtTF1US13K9aFsxU7dQqN_kmgVvfS9RpaCYAyOh_tgHU054oNtFo3a4jC2d1Epy3Pqf2JdFbY7zc77EYlWIuP8AwzKvPosdd6juDzpn3T7gy2UOf66unyFoduCdjwmA4xSSW1xuUAGVzKMg2-Xgb-gG1SnYCTRbSjliaCpoawnHOZWCHBNc44Z1Y7Jtk2WijKwu2gSDIj8oTFOveeczORLhbaMgA_1Bf8yxtobywm1RtVxVBxzFPPGWUN1KwFp0woNO77XbyrSYlkL3cFclde7oo00PF4Sr3ejJ_JlDZU7ks9-Mbhs-bs1xpT4ZAO1GRLNdBJrcXJ8L-L7c5ebA8tAaoSIz7ZPloYfn65A0AuQ30YtucvP8vkPw
  priority: 102
  providerName: ProQuest
Title Memory bandwidth optimization of SpMV on GPGPUs
URI http://lib.cqvip.com/qk/71018X/201503/664841053.html
https://journal.hep.com.cn/fcs/EN/10.1007/s11704-014-4127-1
https://link.springer.com/article/10.1007/s11704-014-4127-1
https://www.proquest.com/docview/2918719341
Volume 9
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3dT9swED-x8jJpGmMfooOhPPC0yaz-SuJHmGjRUFG10Yk9WfFHKIKlbA2a4K_nnMZURRsST4kU25Hv_PH7-c53ADsuo6X01hDUPRKUlBuSF7IgSuaW8cJ7K8NF4eFxejgWX0_laXuPexa93aNJslmpF5fdaNZ4TAgiKMsIUp5VhB890YHVvcHPo8XRCgs4ubEfMAQQJJxxRHvmv9oJURUm0-rsN_5zaXd6MWlcLbxrXFKXIOgDq2mzGfXX4CR2Y-6DcrF7XZtde_sgwuMT-_kKXrbgNNmbj6Z1WPHVa1iLiR-Sdh14A5-HwT_3JjFF5f6eu3qSTHHl-dVe6UymZfL9avgjwdfBaDAaz97CuH9w8uWQtJkXiBUpqwnNFXPWlwg_fM4Lg7yxQG7hesxkSNio8cyxTLLcUl4KoTInFdLsQgjujOeKv4NONa38BiSKW1lmPDVlYOG9TPlUGscRSLEQPLDswua99PXVPMKGTlORB_9T3oVe1Ie2bdDykDvjUi_CLQdxaRSXDuLStAsf76vE9h4pTJeUrMsQNiIkIX-szlYcCLqd8DPNFEXqqRATdOFT1Ovi838be_-k0pvwHAGbnLuqbUGn_nPtPyAoqs02PMv7g-12KuBz_-B49O0O_n_-gA
linkProvider Springer Nature
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9NAEF5V5QASojzVtAX2ABfQqtmX7T1UCAFJSpuqEg3qbfE-TJConTZBVf9UfyMzjp0oSOTWm6X17mFmduabnRchb0LKCx29Y8B7cFAS6ViW65wZnXkh8xi9xkLh4UkyGKmv5_p8g9y2tTCYVtnqxFpRh8rjG_m-MBywvQGl-2FyyXBqFEZX2xEac7E4ijfX4LJNDw4_A3_fCtH7cvZpwJqpAsyrRMwYz4wIPhZgWmMmcwc-UQ64OXSFS8EZ4S6KIFItMs9loZRJgzbgQuZKyeCixOZLoPLvKSkN3qis11-86QgE6HXgQgByYfi40gZS62o9ntYpH4opLlLGsZ3DuCp_XoKRWjGLD8d1jkcMdS7sCvb9J1xbW8HeY_Koga_041zenpCNWD4lW-1oCNpoimdkf4gZvDfU5WW4_hVmY1qBbrpoij5pVdBvk-F3Cp_90_7paPqcjO6Eii_IZlmVcZtQI70uUpm4Av30bmpiol2QALUEthcsOmR3QSY7mffgsEmiMsxQlR3SbQlnfdPWHKdr_LbLhsxIdwt0t0h3yzvk3WJLe96an_kKN2yBjSVwTPm6PXstx2yjEqZ2KcAd8r7l4nL5v4ftrD_sNbk_OBse2-PDk6Nd8gDwnJ5nsu2RzdnVn_gSMNPMvaoFlZIfd30z_gKIXR4I
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1bb9MwFD5CnYSQEOOqlQ3IA08gr_UtiR-nQTsYnSpB0Xgy8SUUAWmhmRD8-h0n8apOMAnxFim2I_s49vf5HH8H4KnLaCm9NQRtjwQl5YbkhSyIkrllvPDeynBReHKSHs3E61N52uU5XcVo9-iSbO80BJWmqh4sXTlYX3yjWRM9IYigLCNIf7ZEkLbrwdbB-MPx-piFBczc-BIYggkSzjuib_NP7QSFhfmi-vQdv7-xU92cN2EX3jXhqRtw9JIHtdmYRtvwMXapjUf5sn9Wm337-5La43_0-Tbc6kBrctDOsjtwzVd3YTsmhEi69eEeDCYhbvdXYorK_fzs6nmywBXpW3fVM1mUydvl5H2Cj-PpeDpb3YfZ6OW7wyPSZWQgVqSsJjRXzFlfIizxOS8M8skCOYcbMpMhkaPGM8cyyXJLeSmEypxUSL8LIbgzniv-AHrVovI7kChuZZnx1JSBnQ8z5VNpHEeAxYKoYNmH3QtL6GWrvKHTVOQhLpX3YRhto20nZh5yanzVaxnmMFwah0uH4dK0D88uqsT2rihMNwyuyyAnEZKTX1VnL04K3S0EK80URUqqECv04Xm08fr1Xxt7-E-ln8D16YuRfvPq5HgXbiCmk2002x706h9n_hHipto87v6Ncz8rB-I
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Memory+bandwidth+optimization+of+SpMV+on+GPGPUs&rft.jtitle=Frontiers+of+Computer+Science&rft.au=Yan%2C+Chenggang+Clarence&rft.au=Yu%2C+Hui&rft.au=Xu%2C+Weizhi&rft.au=Zhang%2C+Yingping&rft.date=2015-06-01&rft.pub=Higher+Education+Press&rft.issn=2095-2228&rft.eissn=2095-2236&rft.volume=9&rft.issue=3&rft.spage=431&rft.epage=441&rft_id=info:doi/10.1007%2Fs11704-014-4127-1&rft.externalDocID=10_1007_s11704_014_4127_1
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fimage.cqvip.com%2Fvip1000%2Fqk%2F71018X%2F71018X.jpg