Efficient Breadth-First Search on the Cell/BE Processor

Multi-core processors are a shift of paradigm in computer architecture that promises a dramatic increase in performance. But they also bring an unprecedented level of complexity in algorithmic design and software development. In this paper we describe the challenges involved in designing a breadth-f...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on parallel and distributed systems Vol. 19; no. 10; pp. 1381 - 1395
Main Authors Scarpazza, D.P., Villa, O., Petrini, F.
Format Journal Article
LanguageEnglish
Published New York IEEE 01.10.2008
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Multi-core processors are a shift of paradigm in computer architecture that promises a dramatic increase in performance. But they also bring an unprecedented level of complexity in algorithmic design and software development. In this paper we describe the challenges involved in designing a breadth-first search (BFS) algorithm for the Cell/B.E. processor. The proposed methodology combines a high-level algorithmic design that captures the machine-independent aspects, to guarantee portability with performance to future processors, with an implementation that embeds processor-specific optimizations. Using a fine-grained global coordination strategy derived by the bulk-synchronous parallel (BSP) model, we have determined an accurate performance model that has guided the implementation and the optimization of our algorithm. Our experiments on a pre-production Cell/B.E. board running at 3.2 GHz, show almost linear speedups when using multiple synergistic processing elements, and an impressive level of performance when compared to other processors. On graphs which offer sufficient parallelism, the Cell/B.E. is typically an order of magnitude faster than conventional processors, such as the AMD Opteron and the Intel Pentium 4 and Woodcrest, and custom-designed architectures, such as the MTA-2 and BlueGene/L.
AbstractList The proposed methodology combines a high-level algorithmic design that captures the machine-independent aspects, to guarantee portability with performance to future processors, with an implementation that embeds processor-specific optimizations.
Multi-core processors are a shift of paradigm in computer architecture that promises a dramatic increase in performance. But they also bring an unprecedented level of complexity in algorithmic design and software development. In this paper [abstract truncated by publisher].
Multi-core processors are a shift of paradigm in computer architecture that promises a dramatic increase in performance. But they also bring an unprecedented level of complexity in algorithmic design and software development. In this paper we describe the challenges involved in designing a breadth-first search (BFS) algorithm for the Cell/B.E. processor. The proposed methodology combines a high-level algorithmic design that captures the machine-independent aspects, to guarantee portability with performance to future processors, with an implementation that embeds processor-specific optimizations. Using a fine-grained global coordination strategy derived by the bulk-synchronous parallel (BSP) model, we have determined an accurate performance model that has guided the implementation and the optimization of our algorithm. Our experiments on a pre-production Cell/B.E. board running at 3.2 GHz, show almost linear speedups when using multiple synergistic processing elements, and an impressive level of performance when compared to other processors. On graphs which offer sufficient parallelism, the Cell/B.E. is typically an order of magnitude faster than conventional processors, such as the AMD Opteron and the Intel Pentium 4 and Woodcrest, and custom-designed architectures, such as the MTA-2 and BlueGene/L.
Author Petrini, F.
Villa, O.
Scarpazza, D.P.
Author_xml – sequence: 1
  givenname: D.P.
  surname: Scarpazza
  fullname: Scarpazza, D.P.
  organization: Cell Solutions Dept., IBM T.J. Watson Res. Center, Yorktown Heights, NY
– sequence: 2
  givenname: O.
  surname: Villa
  fullname: Villa, O.
– sequence: 3
  givenname: F.
  surname: Petrini
  fullname: Petrini, F.
BookMark eNqFkb9PWzEQx60KJCB0Zujy1KFMLzn_tscSQlsJqUiE2TLmnmKUPFPbGfrf12kqhgzp5Bs-nzvffS_IyZhGJOSKwpRSsLPlw-3jlAHoqQZD6QdyTqU0PaOGn7QahOwto_aMXJTyCkCFBHFO9GIYYog41u4mo3-pq_4u5lK7R_Q5rLo0dnWF3RzX69nNonvIKWApKV-S08GvC378907I091iOf_e3__89mP-9b4PAmztUQjGlBKMc2-ZQk89oPD2BaQaNNPPBlQrtfcYaLDPQSqjjQnU84Ywzyfket_3LadfWyzVbWIJ7Td-xLQtzgJXjGup_0saLUFp2YQJ-XKU5EJQ0O1uE_L5AHxN2zy2fZ2lbS9r7A6a7aGQUykZB_eW48bn346C2yXjdsm4XTLubzLNkAdGiNXXmMaafVwf8T7tvYiI71OEAK0s438ATo-ZBQ
CODEN ITDSEO
CitedBy_id crossref_primary_10_1016_j_procs_2012_04_026
crossref_primary_10_1007_s00500_018_3260_4
crossref_primary_10_1080_17445760_2011_577432
crossref_primary_10_1177_1094342018762510
crossref_primary_10_1007_s11227_018_2525_0
crossref_primary_10_1007_s42514_020_00039_4
crossref_primary_10_1007_s00450_012_0207_3
crossref_primary_10_1142_S0129626410000272
crossref_primary_10_1016_j_jpdc_2017_09_007
crossref_primary_10_1109_TPDS_2023_3322755
crossref_primary_10_1109_TPDS_2014_2330597
crossref_primary_10_1109_TC_2014_2366731
crossref_primary_10_1145_2717511
crossref_primary_10_1145_2370036_2145832
crossref_primary_10_1016_j_jpdc_2014_11_006
Cites_doi 10.1145/1062261.1062268
10.1109/MM.2005.35
10.1109/NAECON.1997.618086
10.1145/1048935.1050207
10.1109/mm.2006.49
10.1109/MC.2006.180
10.1109/IPDPS.2007.370267
10.1109/71.473515
10.1145/103727.103729
10.1103/PhysRevE.72.027104
10.1109/MC.2006.29
10.1145/1094811.1094844
10.1103/PhysRevE.69.026113
10.1103/PhysRevE.69.066133
10.1147/sj.451.0085
10.1504/IJHPCN.2006.010635
10.1109/MM.2005.34
10.1109/FCCM.2006.45
10.1109/CONECT.2005.12
10.1109/MM.2005.28
10.1109/SC.2006.17
10.1145/79173.79181
10.1109/IPDPS.2007.370266
10.1109/71.250114
10.1007/978-3-540-72521-3_6
10.1109/MM.2002.997877
10.1109/ICPP.2006.34
10.1007/11403937_5
10.1109/IPDPS.2005.75
10.1109/SHPCC.1994.296721
10.1109/ISCA.2003.1207019
10.1109/IPDPS.2004.1303269
10.1109/SC.2005.4
10.1007/s00453-001-0109-4
10.1007/978-3-540-71351-7_38
10.1109/MM.2005.37
10.1109/SC.2006.55
10.1145/216585.216588
10.1007/978-3-540-68405-3_17
10.1147/rd.494.0589
10.1140/epjb/e2004-00124-y
10.1145/1128022.1128027
10.1103/PhysRevE.70.066111
10.1023/A:1011168003859
10.1109/VR.2007.352468
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2008
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2008
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
F28
FR3
DOI 10.1109/TPDS.2007.70811
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
ANTE: Abstracts in New Technology & Engineering
Engineering Research Database
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
Engineering Research Database
ANTE: Abstracts in New Technology & Engineering
DatabaseTitleList Technology Research Database
Technology Research Database

Technology Research Database
Technology Research Database
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
Architecture
EISSN 1558-2183
EndPage 1395
ExternalDocumentID 2545051771
10_1109_TPDS_2007_70811
4407692
Genre orig-research
GroupedDBID --Z
-~X
.DC
0R~
29I
4.4
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABFSI
ABQJQ
ABVLG
ACGFO
ACIWK
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
E.L
EBS
EJD
HZ~
H~9
ICLAB
IEDLZ
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNI
RNS
RZB
TN5
TWZ
UHB
VH1
AAYOK
AAYXX
CITATION
RIG
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
F28
FR3
ID FETCH-LOGICAL-c409t-e4422664233a926ea1a0e4a9d056f727b8060567aaec1c9bc568788c1a3d052a3
IEDL.DBID RIE
ISSN 1045-9219
IngestDate Fri Jul 11 03:44:19 EDT 2025
Fri Jul 11 08:50:20 EDT 2025
Fri Jul 11 02:07:47 EDT 2025
Mon Jun 30 05:11:49 EDT 2025
Tue Jul 01 05:16:57 EDT 2025
Thu Apr 24 22:51:50 EDT 2025
Wed Aug 27 02:52:20 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 10
Keywords Communication/Networking and Information Technology
Performance of Systems
Emerging technologies
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c409t-e4422664233a926ea1a0e4a9d056f727b8060567aaec1c9bc568788c1a3d052a3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ObjectType-Article-2
ObjectType-Feature-1
content type line 23
PQID 912269898
PQPubID 23500
PageCount 15
ParticipantIDs proquest_miscellaneous_903623757
proquest_journals_912269898
crossref_primary_10_1109_TPDS_2007_70811
proquest_miscellaneous_875067590
ieee_primary_4407692
proquest_miscellaneous_34410718
crossref_citationtrail_10_1109_TPDS_2007_70811
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2008-10-01
PublicationDateYYYYMMDD 2008-10-01
PublicationDate_xml – month: 10
  year: 2008
  text: 2008-10-01
  day: 01
PublicationDecade 2000
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on parallel and distributed systems
PublicationTitleAbbrev TPDS
PublicationYear 2008
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref12
ref15
ref14
ref53
ref52
Kunzman (ref29)
Chandra (ref8) 2001
ref17
ref16
ref18
Bouzas (ref6)
ref51
ref50
Feo (ref19) 2006
ref46
Kurzak (ref30) 2006
ref45
ref48
ref47
ref42
ref41
ref44
Dean (ref11)
ref43
Bellens (ref4)
ref49
ref9
ref3
ref5
ref40
ref35
ref34
ref37
ref36
ref31
ref33
ref32
ref2
ref1
ref38
Davis (ref10) 1994; 94
ref24
ref23
ref26
ref25
ref20
ref22
ref21
(ref39) 2008
ref28
ref27
Carter (ref7)
References_xml – ident: ref20
  doi: 10.1145/1062261.1062268
– ident: ref27
  doi: 10.1109/MM.2005.35
– ident: ref45
  doi: 10.1109/NAECON.1997.618086
– ident: ref21
  doi: 10.1145/1048935.1050207
– ident: ref26
  doi: 10.1109/mm.2006.49
– ident: ref31
  doi: 10.1109/MC.2006.180
– volume-title: Parallel Programming in OpenMP
  year: 2001
  ident: ref8
– ident: ref5
  doi: 10.1109/IPDPS.2007.370267
– ident: ref16
  doi: 10.1109/71.473515
– ident: ref33
  doi: 10.1145/103727.103729
– ident: ref17
  doi: 10.1103/PhysRevE.72.027104
– ident: ref3
  doi: 10.1109/MC.2006.29
– volume-title: technical report, Univ. of Tennessee
  year: 2006
  ident: ref30
  article-title: Implementation of the Mixed-Precision in Solving Systems of Linear Equations on the Cell Processor
– ident: ref23
  doi: 10.1145/1094811.1094844
– ident: ref37
  doi: 10.1103/PhysRevE.69.026113
– year: 2008
  ident: ref39
  article-title: GeForce 8800 GPU Architecture Overview
– ident: ref36
  doi: 10.1103/PhysRevE.69.066133
– ident: ref40
  doi: 10.1147/sj.451.0085
– ident: ref42
  doi: 10.1504/IJHPCN.2006.010635
– ident: ref32
  doi: 10.1109/MM.2005.34
– volume-title: Proc. Workshop Programming Models for Ubiquitous Parallelism (PMUP ’06)
  ident: ref29
  article-title: Charm++, Offload API, and the Cell Processor
– ident: ref13
  doi: 10.1109/FCCM.2006.45
– ident: ref14
  doi: 10.1109/CONECT.2005.12
– ident: ref28
  doi: 10.1109/MM.2005.28
– volume-title: Proc. Int’l Conf. High-Performance Computing, Networking, Storage and Analysis (SuperComputing ’06)
  ident: ref4
  article-title: CellSs: A Programming Model for the Cell BE Architecture
  doi: 10.1109/SC.2006.17
– ident: ref48
  doi: 10.1145/79173.79181
– ident: ref1
  doi: 10.1109/IPDPS.2007.370266
– ident: ref15
  doi: 10.1109/71.250114
– ident: ref53
  doi: 10.1007/978-3-540-72521-3_6
– ident: ref47
  doi: 10.1109/MM.2002.997877
– ident: ref2
  doi: 10.1109/ICPP.2006.34
– ident: ref41
  doi: 10.1007/11403937_5
– ident: ref44
  doi: 10.1109/IPDPS.2005.75
– volume: 94
  issue: 42
  volume-title: NA Digest
  year: 1994
  ident: ref10
  article-title: Sparse Matrix Collection
– ident: ref22
  doi: 10.1109/SHPCC.1994.296721
– volume-title: Optimized BFS Algorithm on the MTA-2 Architecture
  year: 2006
  ident: ref19
– ident: ref43
  doi: 10.1109/ISCA.2003.1207019
– ident: ref24
  doi: 10.1109/IPDPS.2004.1303269
– ident: ref51
  doi: 10.1109/SC.2005.4
– volume-title: Proc. First Workshop Software Tools for Multi-Core Systems (STMCS ’06)
  ident: ref6
  article-title: MultiCore Framework: An API for Programming Heterogeneous Multicore Processors
– ident: ref12
  doi: 10.1007/s00453-001-0109-4
– volume-title: Proc. Seventh Int’l Meeting on High-Performance Computing for Computational Science (VECPAR ’06)
  ident: ref7
  article-title: Performance Evaluation of Scientific Applications on Modern Parallel Vector Systems
  doi: 10.1007/978-3-540-71351-7_38
– ident: ref34
  doi: 10.1109/MM.2005.37
– ident: ref18
  doi: 10.1109/SC.2006.55
– ident: ref50
  doi: 10.1145/216585.216588
– ident: ref52
  doi: 10.1007/978-3-540-68405-3_17
– ident: ref25
  doi: 10.1147/rd.494.0589
– ident: ref35
  doi: 10.1140/epjb/e2004-00124-y
– start-page: 137
  volume-title: Proc. Sixth Symp. Operating System Design and Implementation (OSDI ’04)
  ident: ref11
  article-title: MapReduce: Simplified Data Processing on Large Clusters
– ident: ref49
  doi: 10.1145/1128022.1128027
– ident: ref9
  doi: 10.1103/PhysRevE.70.066111
– ident: ref38
  doi: 10.1023/A:1011168003859
– ident: ref46
  doi: 10.1109/VR.2007.352468
SSID ssj0014504
Score 2.1264708
Snippet Multi-core processors are a shift of paradigm in computer architecture that promises a dramatic increase in performance. But they also bring an unprecedented...
The proposed methodology combines a high-level algorithmic design that captures the machine-independent aspects, to guarantee portability with performance to...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 1381
SubjectTerms Algorithm design and analysis
Algorithms
Architecture
Communication/Networking and Information Technology
Computer architecture
Computer Society
Design engineering
Emerging technologies
Energy consumption
Engines
Frequency
Mathematical models
Microprocessors
Optimization
Parallel processing
Performance of Systems
Process design
Programming
Searching
Software algorithms
Software development
Studies
Title Efficient Breadth-First Search on the Cell/BE Processor
URI https://ieeexplore.ieee.org/document/4407692
https://www.proquest.com/docview/912269898
https://www.proquest.com/docview/34410718
https://www.proquest.com/docview/875067590
https://www.proquest.com/docview/903623757
Volume 19
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT-MwEB6VntgDz11Rnj7sgQNuncRJ6iOPVggJtBIgcYtsZ6pFVAmC9MKvZ-w8BMtW4hYpE8uZ8WS-icffAPzOI5PmKkx4aMKcS0wVNwIFR4ytjAUFYF88fn2TXN7Lq4f4oQcn3VkYRPTFZzh0l34vPy_twv0qG0nKPhJFH9wVStzqs1rdjoEbuGYeiLkiN2xofAKhRnd_Lm5rssKUAmDwKQL5lipfvsM-uEzX4bqdVl1T8jRcVGZo3_5hbPzuvDdgrUGZ7LReFpvQw2IL1tsODqxx6C348YGOcBvSieeToLHYGWHJvPrLp4-EDlldk8zKghFcZOc4n4_OJqw5Y1C-_IT76eTu_JI3jRW4pXSu4ijd-VnKPKJIk51QB1qg1ConNDQjQGPGgrKcJNUabWCVsXEyJo3bQEckEuroF_SLssAdYDgLrTEzHCdSSBPPjMoTGxOMCm0itcwHMGyVndmGddw1v5hnPvsQKnPWcc0w08xbZwDH3QPPNeHGctFtp-tOrFHzAPZaa2aNQ75mKqA3dr0yB3DU3SVPctsjusBy8ZpFtDAJcJEEWyJBuZ1LsJRYLqIcIojSON39_-T2YDVsGXWDfehXLws8IFhTmUO_nt8BnEfyLA
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Nb9QwEB1V5QAcKG1BbEupDxx6wLtOYifrIy27WqBbVWIr9RbZzqxArBLUZi_99YydD9GWlXqLlInleDyZN_HMG4CPRWKzQscpj21ccImZ5lag4IjKSSXIAYfk8flFOruS367V9RZ86mthEDEkn-HQX4az_KJya_-rbCQp-kg1fXCfkd9XUVOt1Z8Z-KEb7gHFNRliS-QTCT1aXH750dAVZuQCo3s-KDRVefQlDu5lugPzbmJNVsnv4bq2Q3f3gLPxqTN_Da9anMk-NxtjF7aw3IOdrocDa016D17-Q0i4D9kkMErQWOyU0GRR_-TTX4QPWZOVzKqSEWBkZ7hajU4nrK0yqG7ewNV0sjib8ba1AncU0NUcpa-gpdgjSQxpCk1kBEqjC8JDS4I0diwozkkzY9BFTlun0jEFyy4yCYnEJnkL22VV4jtguIydtUscp1JIq5ZWF6lTBKRil0ojiwEMu8XOXcs77ttfrPIQfwide-34dphZHrQzgJP-gT8N5cZm0X2_1r1Yu8wDOOy0mbcmeZvriN7Yd8scwHF_l2zJH5CYEqv1bZ7Q1iTIRRJsgwRFdz7E0mKziPaYIMlUdvD_yR3D89lifp6ff734fggv4o5fN3oP2_XNGo8I5NT2Q9jbfwFjSvV1
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Efficient+Breadth-First+Search+on+the+Cell%2FBE+Processor&rft.jtitle=IEEE+transactions+on+parallel+and+distributed+systems&rft.au=Scarpazza%2C+D.P&rft.au=Villa%2C+O&rft.au=Petrini%2C+F&rft.date=2008-10-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=1045-9219&rft.eissn=1558-2183&rft.volume=19&rft.issue=10&rft.spage=1381&rft_id=info:doi/10.1109%2FTPDS.2007.70811&rft.externalDBID=NO_FULL_TEXT&rft.externalDocID=2545051771
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1045-9219&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1045-9219&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1045-9219&client=summon