GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition
Processing-In-Memory (PIM) is an effective technique that reduces data movements by integrating processing units within memory. The recent advance of "big data" and 3D stacking technology make PIM a practical and viable solution for the modern data processing workloads. It is exemplified b...
Saved in:
Published in | Proceedings - International Symposium on High-Performance Computer Architecture pp. 544 - 557 |
---|---|
Main Authors | , , , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.02.2018
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Processing-In-Memory (PIM) is an effective technique that reduces data movements by integrating processing units within memory. The recent advance of "big data" and 3D stacking technology make PIM a practical and viable solution for the modern data processing workloads. It is exemplified by the recent research interests on PIM-based acceleration. Among them, TESSERACT is a PIM-enabled parallel graph processing architecture based on Micron's Hybrid Memory Cube (HMC), one of the most prominent 3D-stacked memory technologies. It implements a Pregel-like vertex-centric programming model, so that users could develop programs in the familiar interface while taking advantage of PIM. Despite the orders of magnitude speedup compared to DRAM-based systems, TESSERACT generates excessive crosscube communications through SerDes links, whose bandwidth is much less than the aggregated local bandwidth of HMCs. Our investigation indicates that this is because of the restricted data organization required by the vertex programming model. In this paper, we argue that a PIM-based graph processing system should take data organization as a first-order design consideration. Following this principle, we propose GraphP, a novel HMC-based software/hardware co-designed graph processing system that drastically reduces communication and energy consumption compared to TESSERACT. GraphP features three key techniques. 1) "Source-cut" partitioning, which fundamentally changes the cross-cube communication from one remote put per cross-cube edge to one update per replica. 2) "Two-phase Vertex Program", a programming model designed for the "source-cut" partitioning with two operations: GenUpdate and ApplyUpdate. 3) Hierarchical communication and overlapping, which further improves performance with unique opportunities offered by the proposed partitioning and programming model. We evaluate GraphP using a cycle accurate simulator with 5 real-world graphs and 4 algorithms. The results show that it provides on average 1.7 speedup and 89% energy saving compared to TESSERACT. |
---|---|
AbstractList | Processing-In-Memory (PIM) is an effective technique that reduces data movements by integrating processing units within memory. The recent advance of "big data" and 3D stacking technology make PIM a practical and viable solution for the modern data processing workloads. It is exemplified by the recent research interests on PIM-based acceleration. Among them, TESSERACT is a PIM-enabled parallel graph processing architecture based on Micron's Hybrid Memory Cube (HMC), one of the most prominent 3D-stacked memory technologies. It implements a Pregel-like vertex-centric programming model, so that users could develop programs in the familiar interface while taking advantage of PIM. Despite the orders of magnitude speedup compared to DRAM-based systems, TESSERACT generates excessive crosscube communications through SerDes links, whose bandwidth is much less than the aggregated local bandwidth of HMCs. Our investigation indicates that this is because of the restricted data organization required by the vertex programming model. In this paper, we argue that a PIM-based graph processing system should take data organization as a first-order design consideration. Following this principle, we propose GraphP, a novel HMC-based software/hardware co-designed graph processing system that drastically reduces communication and energy consumption compared to TESSERACT. GraphP features three key techniques. 1) "Source-cut" partitioning, which fundamentally changes the cross-cube communication from one remote put per cross-cube edge to one update per replica. 2) "Two-phase Vertex Program", a programming model designed for the "source-cut" partitioning with two operations: GenUpdate and ApplyUpdate. 3) Hierarchical communication and overlapping, which further improves performance with unique opportunities offered by the proposed partitioning and programming model. We evaluate GraphP using a cycle accurate simulator with 5 real-world graphs and 4 algorithms. The results show that it provides on average 1.7 speedup and 89% energy saving compared to TESSERACT. |
Author | Gao, Mingyu Wang, Chao Chen, Kang Zhang, Mingxing Qian, Xuehai Zhuo, Youwei Wu, Yongwei Kozyrakis, Christos |
Author_xml | – sequence: 1 givenname: Mingxing surname: Zhang fullname: Zhang, Mingxing – sequence: 2 givenname: Youwei surname: Zhuo fullname: Zhuo, Youwei – sequence: 3 givenname: Chao surname: Wang fullname: Wang, Chao – sequence: 4 givenname: Mingyu surname: Gao fullname: Gao, Mingyu – sequence: 5 givenname: Yongwei surname: Wu fullname: Wu, Yongwei – sequence: 6 givenname: Kang surname: Chen fullname: Chen, Kang – sequence: 7 givenname: Christos surname: Kozyrakis fullname: Kozyrakis, Christos – sequence: 8 givenname: Xuehai surname: Qian fullname: Qian, Xuehai |
BookMark | eNotj0FLwzAYhqMouE7PHrzkD7Tm65ekjbfZzW0wsYjCwMNIk9RFXDvSDvHf26mn5_K8LzwROWvaxhFyDSwBYOp2URaTJGWQJ4wxgSckAoG5RCnU-pSMUszyOGW4viBR130MTqoEjMjbPOj9tryjz84ejG_eadHudofGG937tqF1G2i5fIzvdecs_ZVpGVrjuu4of_l-S2d17Y13TU-nute01KH3x_ElOa_1Z-eu_jkmrw-zl2IRr57my2Kyij1koo9NJTgqW2t0prK1sxnn3Eg0xlglwUpjKy4AmeZDGMszxeUA4AB86E1xTG7-fr1zbrMPfqfD9ybHNGMo8QfxmlL9 |
CODEN | IEEPAD |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1109/HPCA.2018.00053 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISBN | 153863659X 9781538636596 |
EISSN | 2378-203X |
EndPage | 557 |
ExternalDocumentID | 8327036 |
Genre | orig-research |
GroupedDBID | 29O 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI M43 OCL RIE RIL RNS |
ID | FETCH-LOGICAL-i175t-cb5439dfa3ecbdfed7444c63cccd961d6cdb45130a40530879463081411401823 |
IEDL.DBID | RIE |
IngestDate | Wed Aug 27 02:51:15 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | true |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i175t-cb5439dfa3ecbdfed7444c63cccd961d6cdb45130a40530879463081411401823 |
PageCount | 14 |
ParticipantIDs | ieee_primary_8327036 |
PublicationCentury | 2000 |
PublicationDate | 2018-Feb |
PublicationDateYYYYMMDD | 2018-02-01 |
PublicationDate_xml | – month: 02 year: 2018 text: 2018-Feb |
PublicationDecade | 2010 |
PublicationTitle | Proceedings - International Symposium on High-Performance Computer Architecture |
PublicationTitleAbbrev | HPCA |
PublicationYear | 2018 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0002951 |
Score | 2.5004237 |
Snippet | Processing-In-Memory (PIM) is an effective technique that reduces data movements by integrating processing units within memory. The recent advance of "big... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 544 |
SubjectTerms | Bandwidth Graph processing Hybrid Memory Cube Memory management Organizations Partitioning algorithms Processing In Memory Programming Three-dimensional displays |
Title | GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition |
URI | https://ieeexplore.ieee.org/document/8327036 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEB7anjxVbcU3e_Dottlk8_KmtbUKlSAWCh7KvgoipEXSi7_emSStDzx4SgibZNndZOab_eYbgAuhJAFVy2OTLBCgOMOV0T5XRLNB2JykhuKQk8doPJUPs3DWgMttLoxzriSfuR6dlnv5dmnWFCrr4-ojvagmNBG4Vbla27-uj65CLd0jvLQ_zgbXRNwipqRHlY-_1U4pTceoDZPNSyvGyFtvXeie-filx_jfXu1C9ytJj2Vb87MHDZfvQ3tTpYHVH20HXu5Ikzq7Yk-k0opN2Y-kEIZeK8vuJ_wG7ZllZWNWpw9QYwrUsmEpNIHdYLeqUCyj9UY3d2E6Gj4PxrwuqcBf0U8ouNEheiB2oQJntF04G0spTRQYY2waCRsZq2WIdk2hI0digamM8CCkICCW-MEBtPJl7g6BCQIvXozPcFrGLkl9Yb0AbV3s2VBH3hF0aKzmq0o1Y14P0_Hfl09gh2ar4kOfQqt4X7szNPeFPi_n-ROEmKnp |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8JAEJ4gHvSECsa3e_BoodtuX94UxaKUNAYSEg-k-yAxJmBMufjrnWkLPuLBU5tm2252t_1mZr_5BuCCZ4IcVW0FKpyhg2KUlSnpWBnRbNBtDiNFcchk6Mdj8TDxJjW4XOfCGGMK8plp02mxl68Xakmhsg6uPtKL2oBNxH2Pl9la6_-ug8ZCJd7D7agTp91rom4RV9Km2sffqqcU4NFrQLJ6bckZeW0vc9lWH78UGf_brx1ofaXpsXQNQLtQM_M9aKzqNLDqs23C8z2pUqdX7Il0WrEp-5EWwtBuZWk_sW4Q0TQrGrMqgYAaU6iW3RVSE9gNdpvlGUtpxdHNLRj37kbd2KqKKlgvaCnklpIe2iB6lrlGST0zOhBCKN9VSunI59pXWgoPkS1DU47kAiPh44ELTq5Y6Lj7UJ8v5uYAGCf3xQ7wGUaKwISRw7XtItoFtvakbx9Ck8Zq-lbqZkyrYTr6-_I5bMWjZDAd9IePx7BNM1eyo0-gnr8vzSmCfy7Pijn_BJIHrTI |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+-+International+Symposium+on+High-Performance+Computer+Architecture&rft.atitle=GraphP%3A+Reducing+Communication+for+PIM-Based+Graph+Processing+with+Efficient+Data+Partition&rft.au=Zhang%2C+Mingxing&rft.au=Zhuo%2C+Youwei&rft.au=Wang%2C+Chao&rft.au=Gao%2C+Mingyu&rft.date=2018-02-01&rft.pub=IEEE&rft.eissn=2378-203X&rft.spage=544&rft.epage=557&rft_id=info:doi/10.1109%2FHPCA.2018.00053&rft.externalDocID=8327036 |