GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition

Processing-In-Memory (PIM) is an effective technique that reduces data movements by integrating processing units within memory. The recent advance of "big data" and 3D stacking technology make PIM a practical and viable solution for the modern data processing workloads. It is exemplified b...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings - International Symposium on High-Performance Computer Architecture pp. 544 - 557
Main Authors	Zhang, Mingxing, Zhuo, Youwei, Wang, Chao, Gao, Mingyu, Wu, Yongwei, Chen, Kang, Kozyrakis, Christos, Qian, Xuehai
Format	Conference Proceeding
Language	English
Published	IEEE 01.02.2018
Subjects	Bandwidth Graph processing Hybrid Memory Cube Memory management Organizations Partitioning algorithms Processing In Memory Programming Three-dimensional displays
Online Access	Get full text

Cover

Loading…

Abstract	Processing-In-Memory (PIM) is an effective technique that reduces data movements by integrating processing units within memory. The recent advance of "big data" and 3D stacking technology make PIM a practical and viable solution for the modern data processing workloads. It is exemplified by the recent research interests on PIM-based acceleration. Among them, TESSERACT is a PIM-enabled parallel graph processing architecture based on Micron's Hybrid Memory Cube (HMC), one of the most prominent 3D-stacked memory technologies. It implements a Pregel-like vertex-centric programming model, so that users could develop programs in the familiar interface while taking advantage of PIM. Despite the orders of magnitude speedup compared to DRAM-based systems, TESSERACT generates excessive crosscube communications through SerDes links, whose bandwidth is much less than the aggregated local bandwidth of HMCs. Our investigation indicates that this is because of the restricted data organization required by the vertex programming model. In this paper, we argue that a PIM-based graph processing system should take data organization as a first-order design consideration. Following this principle, we propose GraphP, a novel HMC-based software/hardware co-designed graph processing system that drastically reduces communication and energy consumption compared to TESSERACT. GraphP features three key techniques. 1) "Source-cut" partitioning, which fundamentally changes the cross-cube communication from one remote put per cross-cube edge to one update per replica. 2) "Two-phase Vertex Program", a programming model designed for the "source-cut" partitioning with two operations: GenUpdate and ApplyUpdate. 3) Hierarchical communication and overlapping, which further improves performance with unique opportunities offered by the proposed partitioning and programming model. We evaluate GraphP using a cycle accurate simulator with 5 real-world graphs and 4 algorithms. The results show that it provides on average 1.7 speedup and 89% energy saving compared to TESSERACT.
AbstractList	Processing-In-Memory (PIM) is an effective technique that reduces data movements by integrating processing units within memory. The recent advance of "big data" and 3D stacking technology make PIM a practical and viable solution for the modern data processing workloads. It is exemplified by the recent research interests on PIM-based acceleration. Among them, TESSERACT is a PIM-enabled parallel graph processing architecture based on Micron's Hybrid Memory Cube (HMC), one of the most prominent 3D-stacked memory technologies. It implements a Pregel-like vertex-centric programming model, so that users could develop programs in the familiar interface while taking advantage of PIM. Despite the orders of magnitude speedup compared to DRAM-based systems, TESSERACT generates excessive crosscube communications through SerDes links, whose bandwidth is much less than the aggregated local bandwidth of HMCs. Our investigation indicates that this is because of the restricted data organization required by the vertex programming model. In this paper, we argue that a PIM-based graph processing system should take data organization as a first-order design consideration. Following this principle, we propose GraphP, a novel HMC-based software/hardware co-designed graph processing system that drastically reduces communication and energy consumption compared to TESSERACT. GraphP features three key techniques. 1) "Source-cut" partitioning, which fundamentally changes the cross-cube communication from one remote put per cross-cube edge to one update per replica. 2) "Two-phase Vertex Program", a programming model designed for the "source-cut" partitioning with two operations: GenUpdate and ApplyUpdate. 3) Hierarchical communication and overlapping, which further improves performance with unique opportunities offered by the proposed partitioning and programming model. We evaluate GraphP using a cycle accurate simulator with 5 real-world graphs and 4 algorithms. The results show that it provides on average 1.7 speedup and 89% energy saving compared to TESSERACT.
Author	Gao, Mingyu Wang, Chao Chen, Kang Zhang, Mingxing Qian, Xuehai Zhuo, Youwei Wu, Yongwei Kozyrakis, Christos
Author_xml	– sequence: 1 givenname: Mingxing surname: Zhang fullname: Zhang, Mingxing – sequence: 2 givenname: Youwei surname: Zhuo fullname: Zhuo, Youwei – sequence: 3 givenname: Chao surname: Wang fullname: Wang, Chao – sequence: 4 givenname: Mingyu surname: Gao fullname: Gao, Mingyu – sequence: 5 givenname: Yongwei surname: Wu fullname: Wu, Yongwei – sequence: 6 givenname: Kang surname: Chen fullname: Chen, Kang – sequence: 7 givenname: Christos surname: Kozyrakis fullname: Kozyrakis, Christos – sequence: 8 givenname: Xuehai surname: Qian fullname: Qian, Xuehai
BookMark	eNotj0FLwzAYhqMouE7PHrzkD7Tm65ekjbfZzW0wsYjCwMNIk9RFXDvSDvHf26mn5_K8LzwROWvaxhFyDSwBYOp2URaTJGWQJ4wxgSckAoG5RCnU-pSMUszyOGW4viBR130MTqoEjMjbPOj9tryjz84ejG_eadHudofGG937tqF1G2i5fIzvdecs_ZVpGVrjuu4of_l-S2d17Y13TU-nute01KH3x_ElOa_1Z-eu_jkmrw-zl2IRr57my2Kyij1koo9NJTgqW2t0prK1sxnn3Eg0xlglwUpjKy4AmeZDGMszxeUA4AB86E1xTG7-fr1zbrMPfqfD9ybHNGMo8QfxmlL9
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/HPCA.2018.00053
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISBN	153863659X 9781538636596
EISSN	2378-203X
EndPage	557
ExternalDocumentID	8327036
Genre	orig-research
GroupedDBID	29O 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI M43 OCL RIE RIL RNS
ID	FETCH-LOGICAL-i175t-cb5439dfa3ecbdfed7444c63cccd961d6cdb45130a40530879463081411401823
IEDL.DBID	RIE
IngestDate	Wed Aug 27 02:51:15 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i175t-cb5439dfa3ecbdfed7444c63cccd961d6cdb45130a40530879463081411401823
PageCount	14
ParticipantIDs	ieee_primary_8327036
PublicationCentury	2000
PublicationDate	2018-Feb
PublicationDateYYYYMMDD	2018-02-01
PublicationDate_xml	– month: 02 year: 2018 text: 2018-Feb
PublicationDecade	2010
PublicationTitle	Proceedings - International Symposium on High-Performance Computer Architecture
PublicationTitleAbbrev	HPCA
PublicationYear	2018
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0002951
Score	2.5004237
Snippet	Processing-In-Memory (PIM) is an effective technique that reduces data movements by integrating processing units within memory. The recent advance of "big...
SourceID	ieee
SourceType	Publisher
StartPage	544
SubjectTerms	Bandwidth Graph processing Hybrid Memory Cube Memory management Organizations Partitioning algorithms Processing In Memory Programming Three-dimensional displays
Title	GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition
URI	https://ieeexplore.ieee.org/document/8327036
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEB7anjxVbcU3e_Dottlk8_KmtbUKlSAWCh7KvgoipEXSi7_emSStDzx4SgibZNndZOab_eYbgAuhJAFVy2OTLBCgOMOV0T5XRLNB2JykhuKQk8doPJUPs3DWgMttLoxzriSfuR6dlnv5dmnWFCrr4-ojvagmNBG4Vbla27-uj65CLd0jvLQ_zgbXRNwipqRHlY-_1U4pTceoDZPNSyvGyFtvXeie-filx_jfXu1C9ytJj2Vb87MHDZfvQ3tTpYHVH20HXu5Ikzq7Yk-k0opN2Y-kEIZeK8vuJ_wG7ZllZWNWpw9QYwrUsmEpNIHdYLeqUCyj9UY3d2E6Gj4PxrwuqcBf0U8ouNEheiB2oQJntF04G0spTRQYY2waCRsZq2WIdk2hI0digamM8CCkICCW-MEBtPJl7g6BCQIvXozPcFrGLkl9Yb0AbV3s2VBH3hF0aKzmq0o1Y14P0_Hfl09gh2ar4kOfQqt4X7szNPeFPi_n-ROEmKnp
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8JAEJ4gHvSECsa3e_BoodtuX94UxaKUNAYSEg-k-yAxJmBMufjrnWkLPuLBU5tm2252t_1mZr_5BuCCZ4IcVW0FKpyhg2KUlSnpWBnRbNBtDiNFcchk6Mdj8TDxJjW4XOfCGGMK8plp02mxl68Xakmhsg6uPtKL2oBNxH2Pl9la6_-ug8ZCJd7D7agTp91rom4RV9Km2sffqqcU4NFrQLJ6bckZeW0vc9lWH78UGf_brx1ofaXpsXQNQLtQM_M9aKzqNLDqs23C8z2pUqdX7Il0WrEp-5EWwtBuZWk_sW4Q0TQrGrMqgYAaU6iW3RVSE9gNdpvlGUtpxdHNLRj37kbd2KqKKlgvaCnklpIe2iB6lrlGST0zOhBCKN9VSunI59pXWgoPkS1DU47kAiPh44ELTq5Y6Lj7UJ8v5uYAGCf3xQ7wGUaKwISRw7XtItoFtvakbx9Ck8Zq-lbqZkyrYTr6-_I5bMWjZDAd9IePx7BNM1eyo0-gnr8vzSmCfy7Pijn_BJIHrTI
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+-+International+Symposium+on+High-Performance+Computer+Architecture&rft.atitle=GraphP%3A+Reducing+Communication+for+PIM-Based+Graph+Processing+with+Efficient+Data+Partition&rft.au=Zhang%2C+Mingxing&rft.au=Zhuo%2C+Youwei&rft.au=Wang%2C+Chao&rft.au=Gao%2C+Mingyu&rft.date=2018-02-01&rft.pub=IEEE&rft.eissn=2378-203X&rft.spage=544&rft.epage=557&rft_id=info:doi/10.1109%2FHPCA.2018.00053&rft.externalDocID=8327036