Efficient Breadth-First Search on the Cell/BE Processor
Multi-core processors are a shift of paradigm in computer architecture that promises a dramatic increase in performance. But they also bring an unprecedented level of complexity in algorithmic design and software development. In this paper we describe the challenges involved in designing a breadth-f...
Saved in:
Published in | IEEE transactions on parallel and distributed systems Vol. 19; no. 10; pp. 1381 - 1395 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.10.2008
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Multi-core processors are a shift of paradigm in computer architecture that promises a dramatic increase in performance. But they also bring an unprecedented level of complexity in algorithmic design and software development. In this paper we describe the challenges involved in designing a breadth-first search (BFS) algorithm for the Cell/B.E. processor. The proposed methodology combines a high-level algorithmic design that captures the machine-independent aspects, to guarantee portability with performance to future processors, with an implementation that embeds processor-specific optimizations. Using a fine-grained global coordination strategy derived by the bulk-synchronous parallel (BSP) model, we have determined an accurate performance model that has guided the implementation and the optimization of our algorithm. Our experiments on a pre-production Cell/B.E. board running at 3.2 GHz, show almost linear speedups when using multiple synergistic processing elements, and an impressive level of performance when compared to other processors. On graphs which offer sufficient parallelism, the Cell/B.E. is typically an order of magnitude faster than conventional processors, such as the AMD Opteron and the Intel Pentium 4 and Woodcrest, and custom-designed architectures, such as the MTA-2 and BlueGene/L. |
---|---|
AbstractList | The proposed methodology combines a high-level algorithmic design that captures the machine-independent aspects, to guarantee portability with performance to future processors, with an implementation that embeds processor-specific optimizations. Multi-core processors are a shift of paradigm in computer architecture that promises a dramatic increase in performance. But they also bring an unprecedented level of complexity in algorithmic design and software development. In this paper [abstract truncated by publisher]. Multi-core processors are a shift of paradigm in computer architecture that promises a dramatic increase in performance. But they also bring an unprecedented level of complexity in algorithmic design and software development. In this paper we describe the challenges involved in designing a breadth-first search (BFS) algorithm for the Cell/B.E. processor. The proposed methodology combines a high-level algorithmic design that captures the machine-independent aspects, to guarantee portability with performance to future processors, with an implementation that embeds processor-specific optimizations. Using a fine-grained global coordination strategy derived by the bulk-synchronous parallel (BSP) model, we have determined an accurate performance model that has guided the implementation and the optimization of our algorithm. Our experiments on a pre-production Cell/B.E. board running at 3.2 GHz, show almost linear speedups when using multiple synergistic processing elements, and an impressive level of performance when compared to other processors. On graphs which offer sufficient parallelism, the Cell/B.E. is typically an order of magnitude faster than conventional processors, such as the AMD Opteron and the Intel Pentium 4 and Woodcrest, and custom-designed architectures, such as the MTA-2 and BlueGene/L. |
Author | Petrini, F. Villa, O. Scarpazza, D.P. |
Author_xml | – sequence: 1 givenname: D.P. surname: Scarpazza fullname: Scarpazza, D.P. organization: Cell Solutions Dept., IBM T.J. Watson Res. Center, Yorktown Heights, NY – sequence: 2 givenname: O. surname: Villa fullname: Villa, O. – sequence: 3 givenname: F. surname: Petrini fullname: Petrini, F. |
BookMark | eNqFkb9PWzEQx60KJCB0Zujy1KFMLzn_tscSQlsJqUiE2TLmnmKUPFPbGfrf12kqhgzp5Bs-nzvffS_IyZhGJOSKwpRSsLPlw-3jlAHoqQZD6QdyTqU0PaOGn7QahOwto_aMXJTyCkCFBHFO9GIYYog41u4mo3-pq_4u5lK7R_Q5rLo0dnWF3RzX69nNonvIKWApKV-S08GvC378907I091iOf_e3__89mP-9b4PAmztUQjGlBKMc2-ZQk89oPD2BaQaNNPPBlQrtfcYaLDPQSqjjQnU84Ywzyfket_3LadfWyzVbWIJ7Td-xLQtzgJXjGup_0saLUFp2YQJ-XKU5EJQ0O1uE_L5AHxN2zy2fZ2lbS9r7A6a7aGQUykZB_eW48bn346C2yXjdsm4XTLubzLNkAdGiNXXmMaafVwf8T7tvYiI71OEAK0s438ATo-ZBQ |
CODEN | ITDSEO |
CitedBy_id | crossref_primary_10_1016_j_procs_2012_04_026 crossref_primary_10_1007_s00500_018_3260_4 crossref_primary_10_1080_17445760_2011_577432 crossref_primary_10_1177_1094342018762510 crossref_primary_10_1007_s11227_018_2525_0 crossref_primary_10_1007_s42514_020_00039_4 crossref_primary_10_1007_s00450_012_0207_3 crossref_primary_10_1142_S0129626410000272 crossref_primary_10_1016_j_jpdc_2017_09_007 crossref_primary_10_1109_TPDS_2023_3322755 crossref_primary_10_1109_TPDS_2014_2330597 crossref_primary_10_1109_TC_2014_2366731 crossref_primary_10_1145_2717511 crossref_primary_10_1145_2370036_2145832 crossref_primary_10_1016_j_jpdc_2014_11_006 |
Cites_doi | 10.1145/1062261.1062268 10.1109/MM.2005.35 10.1109/NAECON.1997.618086 10.1145/1048935.1050207 10.1109/mm.2006.49 10.1109/MC.2006.180 10.1109/IPDPS.2007.370267 10.1109/71.473515 10.1145/103727.103729 10.1103/PhysRevE.72.027104 10.1109/MC.2006.29 10.1145/1094811.1094844 10.1103/PhysRevE.69.026113 10.1103/PhysRevE.69.066133 10.1147/sj.451.0085 10.1504/IJHPCN.2006.010635 10.1109/MM.2005.34 10.1109/FCCM.2006.45 10.1109/CONECT.2005.12 10.1109/MM.2005.28 10.1109/SC.2006.17 10.1145/79173.79181 10.1109/IPDPS.2007.370266 10.1109/71.250114 10.1007/978-3-540-72521-3_6 10.1109/MM.2002.997877 10.1109/ICPP.2006.34 10.1007/11403937_5 10.1109/IPDPS.2005.75 10.1109/SHPCC.1994.296721 10.1109/ISCA.2003.1207019 10.1109/IPDPS.2004.1303269 10.1109/SC.2005.4 10.1007/s00453-001-0109-4 10.1007/978-3-540-71351-7_38 10.1109/MM.2005.37 10.1109/SC.2006.55 10.1145/216585.216588 10.1007/978-3-540-68405-3_17 10.1147/rd.494.0589 10.1140/epjb/e2004-00124-y 10.1145/1128022.1128027 10.1103/PhysRevE.70.066111 10.1023/A:1011168003859 10.1109/VR.2007.352468 |
ContentType | Journal Article |
Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2008 |
Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2008 |
DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D F28 FR3 |
DOI | 10.1109/TPDS.2007.70811 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional ANTE: Abstracts in New Technology & Engineering Engineering Research Database |
DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional Engineering Research Database ANTE: Abstracts in New Technology & Engineering |
DatabaseTitleList | Technology Research Database Technology Research Database Technology Research Database Technology Research Database |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering Computer Science Architecture |
EISSN | 1558-2183 |
EndPage | 1395 |
ExternalDocumentID | 2545051771 10_1109_TPDS_2007_70811 4407692 |
Genre | orig-research |
GroupedDBID | --Z -~X .DC 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABFSI ABQJQ ABVLG ACGFO ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD HZ~ H~9 ICLAB IEDLZ IFIPE IFJZH IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNI RNS RZB TN5 TWZ UHB VH1 AAYOK AAYXX CITATION RIG 7SC 7SP 8FD JQ2 L7M L~C L~D F28 FR3 |
ID | FETCH-LOGICAL-c409t-e4422664233a926ea1a0e4a9d056f727b8060567aaec1c9bc568788c1a3d052a3 |
IEDL.DBID | RIE |
ISSN | 1045-9219 |
IngestDate | Fri Jul 11 03:44:19 EDT 2025 Fri Jul 11 08:50:20 EDT 2025 Fri Jul 11 02:07:47 EDT 2025 Mon Jun 30 05:11:49 EDT 2025 Tue Jul 01 05:16:57 EDT 2025 Thu Apr 24 22:51:50 EDT 2025 Wed Aug 27 02:52:20 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 10 |
Keywords | Communication/Networking and Information Technology Performance of Systems Emerging technologies |
Language | English |
License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c409t-e4422664233a926ea1a0e4a9d056f727b8060567aaec1c9bc568788c1a3d052a3 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 ObjectType-Article-2 ObjectType-Feature-1 content type line 23 |
PQID | 912269898 |
PQPubID | 23500 |
PageCount | 15 |
ParticipantIDs | proquest_miscellaneous_903623757 proquest_journals_912269898 crossref_primary_10_1109_TPDS_2007_70811 proquest_miscellaneous_875067590 ieee_primary_4407692 proquest_miscellaneous_34410718 crossref_citationtrail_10_1109_TPDS_2007_70811 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2008-10-01 |
PublicationDateYYYYMMDD | 2008-10-01 |
PublicationDate_xml | – month: 10 year: 2008 text: 2008-10-01 day: 01 |
PublicationDecade | 2000 |
PublicationPlace | New York |
PublicationPlace_xml | – name: New York |
PublicationTitle | IEEE transactions on parallel and distributed systems |
PublicationTitleAbbrev | TPDS |
PublicationYear | 2008 |
Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
References | ref13 ref12 ref15 ref14 ref53 ref52 Kunzman (ref29) Chandra (ref8) 2001 ref17 ref16 ref18 Bouzas (ref6) ref51 ref50 Feo (ref19) 2006 ref46 Kurzak (ref30) 2006 ref45 ref48 ref47 ref42 ref41 ref44 Dean (ref11) ref43 Bellens (ref4) ref49 ref9 ref3 ref5 ref40 ref35 ref34 ref37 ref36 ref31 ref33 ref32 ref2 ref1 ref38 Davis (ref10) 1994; 94 ref24 ref23 ref26 ref25 ref20 ref22 ref21 (ref39) 2008 ref28 ref27 Carter (ref7) |
References_xml | – ident: ref20 doi: 10.1145/1062261.1062268 – ident: ref27 doi: 10.1109/MM.2005.35 – ident: ref45 doi: 10.1109/NAECON.1997.618086 – ident: ref21 doi: 10.1145/1048935.1050207 – ident: ref26 doi: 10.1109/mm.2006.49 – ident: ref31 doi: 10.1109/MC.2006.180 – volume-title: Parallel Programming in OpenMP year: 2001 ident: ref8 – ident: ref5 doi: 10.1109/IPDPS.2007.370267 – ident: ref16 doi: 10.1109/71.473515 – ident: ref33 doi: 10.1145/103727.103729 – ident: ref17 doi: 10.1103/PhysRevE.72.027104 – ident: ref3 doi: 10.1109/MC.2006.29 – volume-title: technical report, Univ. of Tennessee year: 2006 ident: ref30 article-title: Implementation of the Mixed-Precision in Solving Systems of Linear Equations on the Cell Processor – ident: ref23 doi: 10.1145/1094811.1094844 – ident: ref37 doi: 10.1103/PhysRevE.69.026113 – year: 2008 ident: ref39 article-title: GeForce 8800 GPU Architecture Overview – ident: ref36 doi: 10.1103/PhysRevE.69.066133 – ident: ref40 doi: 10.1147/sj.451.0085 – ident: ref42 doi: 10.1504/IJHPCN.2006.010635 – ident: ref32 doi: 10.1109/MM.2005.34 – volume-title: Proc. Workshop Programming Models for Ubiquitous Parallelism (PMUP ’06) ident: ref29 article-title: Charm++, Offload API, and the Cell Processor – ident: ref13 doi: 10.1109/FCCM.2006.45 – ident: ref14 doi: 10.1109/CONECT.2005.12 – ident: ref28 doi: 10.1109/MM.2005.28 – volume-title: Proc. Int’l Conf. High-Performance Computing, Networking, Storage and Analysis (SuperComputing ’06) ident: ref4 article-title: CellSs: A Programming Model for the Cell BE Architecture doi: 10.1109/SC.2006.17 – ident: ref48 doi: 10.1145/79173.79181 – ident: ref1 doi: 10.1109/IPDPS.2007.370266 – ident: ref15 doi: 10.1109/71.250114 – ident: ref53 doi: 10.1007/978-3-540-72521-3_6 – ident: ref47 doi: 10.1109/MM.2002.997877 – ident: ref2 doi: 10.1109/ICPP.2006.34 – ident: ref41 doi: 10.1007/11403937_5 – ident: ref44 doi: 10.1109/IPDPS.2005.75 – volume: 94 issue: 42 volume-title: NA Digest year: 1994 ident: ref10 article-title: Sparse Matrix Collection – ident: ref22 doi: 10.1109/SHPCC.1994.296721 – volume-title: Optimized BFS Algorithm on the MTA-2 Architecture year: 2006 ident: ref19 – ident: ref43 doi: 10.1109/ISCA.2003.1207019 – ident: ref24 doi: 10.1109/IPDPS.2004.1303269 – ident: ref51 doi: 10.1109/SC.2005.4 – volume-title: Proc. First Workshop Software Tools for Multi-Core Systems (STMCS ’06) ident: ref6 article-title: MultiCore Framework: An API for Programming Heterogeneous Multicore Processors – ident: ref12 doi: 10.1007/s00453-001-0109-4 – volume-title: Proc. Seventh Int’l Meeting on High-Performance Computing for Computational Science (VECPAR ’06) ident: ref7 article-title: Performance Evaluation of Scientific Applications on Modern Parallel Vector Systems doi: 10.1007/978-3-540-71351-7_38 – ident: ref34 doi: 10.1109/MM.2005.37 – ident: ref18 doi: 10.1109/SC.2006.55 – ident: ref50 doi: 10.1145/216585.216588 – ident: ref52 doi: 10.1007/978-3-540-68405-3_17 – ident: ref25 doi: 10.1147/rd.494.0589 – ident: ref35 doi: 10.1140/epjb/e2004-00124-y – start-page: 137 volume-title: Proc. Sixth Symp. Operating System Design and Implementation (OSDI ’04) ident: ref11 article-title: MapReduce: Simplified Data Processing on Large Clusters – ident: ref49 doi: 10.1145/1128022.1128027 – ident: ref9 doi: 10.1103/PhysRevE.70.066111 – ident: ref38 doi: 10.1023/A:1011168003859 – ident: ref46 doi: 10.1109/VR.2007.352468 |
SSID | ssj0014504 |
Score | 2.1264708 |
Snippet | Multi-core processors are a shift of paradigm in computer architecture that promises a dramatic increase in performance. But they also bring an unprecedented... The proposed methodology combines a high-level algorithmic design that captures the machine-independent aspects, to guarantee portability with performance to... |
SourceID | proquest crossref ieee |
SourceType | Aggregation Database Enrichment Source Index Database Publisher |
StartPage | 1381 |
SubjectTerms | Algorithm design and analysis Algorithms Architecture Communication/Networking and Information Technology Computer architecture Computer Society Design engineering Emerging technologies Energy consumption Engines Frequency Mathematical models Microprocessors Optimization Parallel processing Performance of Systems Process design Programming Searching Software algorithms Software development Studies |
Title | Efficient Breadth-First Search on the Cell/BE Processor |
URI | https://ieeexplore.ieee.org/document/4407692 https://www.proquest.com/docview/912269898 https://www.proquest.com/docview/34410718 https://www.proquest.com/docview/875067590 https://www.proquest.com/docview/903623757 |
Volume | 19 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT-MwEB6VntgDz11Rnj7sgQNuncRJ6iOPVggJtBIgcYtsZ6pFVAmC9MKvZ-w8BMtW4hYpE8uZ8WS-icffAPzOI5PmKkx4aMKcS0wVNwIFR4ytjAUFYF88fn2TXN7Lq4f4oQcn3VkYRPTFZzh0l34vPy_twv0qG0nKPhJFH9wVStzqs1rdjoEbuGYeiLkiN2xofAKhRnd_Lm5rssKUAmDwKQL5lipfvsM-uEzX4bqdVl1T8jRcVGZo3_5hbPzuvDdgrUGZ7LReFpvQw2IL1tsODqxx6C348YGOcBvSieeToLHYGWHJvPrLp4-EDlldk8zKghFcZOc4n4_OJqw5Y1C-_IT76eTu_JI3jRW4pXSu4ijd-VnKPKJIk51QB1qg1ConNDQjQGPGgrKcJNUabWCVsXEyJo3bQEckEuroF_SLssAdYDgLrTEzHCdSSBPPjMoTGxOMCm0itcwHMGyVndmGddw1v5hnPvsQKnPWcc0w08xbZwDH3QPPNeHGctFtp-tOrFHzAPZaa2aNQ75mKqA3dr0yB3DU3SVPctsjusBy8ZpFtDAJcJEEWyJBuZ1LsJRYLqIcIojSON39_-T2YDVsGXWDfehXLws8IFhTmUO_nt8BnEfyLA |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Nb9QwEB1V5QAcKG1BbEupDxx6wLtOYifrIy27WqBbVWIr9RbZzqxArBLUZi_99YydD9GWlXqLlInleDyZN_HMG4CPRWKzQscpj21ccImZ5lag4IjKSSXIAYfk8flFOruS367V9RZ86mthEDEkn-HQX4az_KJya_-rbCQp-kg1fXCfkd9XUVOt1Z8Z-KEb7gHFNRliS-QTCT1aXH750dAVZuQCo3s-KDRVefQlDu5lugPzbmJNVsnv4bq2Q3f3gLPxqTN_Da9anMk-NxtjF7aw3IOdrocDa016D17-Q0i4D9kkMErQWOyU0GRR_-TTX4QPWZOVzKqSEWBkZ7hajU4nrK0yqG7ewNV0sjib8ba1AncU0NUcpa-gpdgjSQxpCk1kBEqjC8JDS4I0diwozkkzY9BFTlun0jEFyy4yCYnEJnkL22VV4jtguIydtUscp1JIq5ZWF6lTBKRil0ojiwEMu8XOXcs77ttfrPIQfwide-34dphZHrQzgJP-gT8N5cZm0X2_1r1Yu8wDOOy0mbcmeZvriN7Yd8scwHF_l2zJH5CYEqv1bZ7Q1iTIRRJsgwRFdz7E0mKziPaYIMlUdvD_yR3D89lifp6ff734fggv4o5fN3oP2_XNGo8I5NT2Q9jbfwFjSvV1 |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Efficient+Breadth-First+Search+on+the+Cell%2FBE+Processor&rft.jtitle=IEEE+transactions+on+parallel+and+distributed+systems&rft.au=Scarpazza%2C+D.P&rft.au=Villa%2C+O&rft.au=Petrini%2C+F&rft.date=2008-10-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=1045-9219&rft.eissn=1558-2183&rft.volume=19&rft.issue=10&rft.spage=1381&rft_id=info:doi/10.1109%2FTPDS.2007.70811&rft.externalDBID=NO_FULL_TEXT&rft.externalDocID=2545051771 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1045-9219&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1045-9219&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1045-9219&client=summon |