GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition

Processing-In-Memory (PIM) is an effective technique that reduces data movements by integrating processing units within memory. The recent advance of "big data" and 3D stacking technology make PIM a practical and viable solution for the modern data processing workloads. It is exemplified b...

Full description

Saved in:
Bibliographic Details
Published inProceedings - International Symposium on High-Performance Computer Architecture pp. 544 - 557
Main Authors Zhang, Mingxing, Zhuo, Youwei, Wang, Chao, Gao, Mingyu, Wu, Yongwei, Chen, Kang, Kozyrakis, Christos, Qian, Xuehai
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.02.2018
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Processing-In-Memory (PIM) is an effective technique that reduces data movements by integrating processing units within memory. The recent advance of "big data" and 3D stacking technology make PIM a practical and viable solution for the modern data processing workloads. It is exemplified by the recent research interests on PIM-based acceleration. Among them, TESSERACT is a PIM-enabled parallel graph processing architecture based on Micron's Hybrid Memory Cube (HMC), one of the most prominent 3D-stacked memory technologies. It implements a Pregel-like vertex-centric programming model, so that users could develop programs in the familiar interface while taking advantage of PIM. Despite the orders of magnitude speedup compared to DRAM-based systems, TESSERACT generates excessive crosscube communications through SerDes links, whose bandwidth is much less than the aggregated local bandwidth of HMCs. Our investigation indicates that this is because of the restricted data organization required by the vertex programming model. In this paper, we argue that a PIM-based graph processing system should take data organization as a first-order design consideration. Following this principle, we propose GraphP, a novel HMC-based software/hardware co-designed graph processing system that drastically reduces communication and energy consumption compared to TESSERACT. GraphP features three key techniques. 1) "Source-cut" partitioning, which fundamentally changes the cross-cube communication from one remote put per cross-cube edge to one update per replica. 2) "Two-phase Vertex Program", a programming model designed for the "source-cut" partitioning with two operations: GenUpdate and ApplyUpdate. 3) Hierarchical communication and overlapping, which further improves performance with unique opportunities offered by the proposed partitioning and programming model. We evaluate GraphP using a cycle accurate simulator with 5 real-world graphs and 4 algorithms. The results show that it provides on average 1.7 speedup and 89% energy saving compared to TESSERACT.
AbstractList Processing-In-Memory (PIM) is an effective technique that reduces data movements by integrating processing units within memory. The recent advance of "big data" and 3D stacking technology make PIM a practical and viable solution for the modern data processing workloads. It is exemplified by the recent research interests on PIM-based acceleration. Among them, TESSERACT is a PIM-enabled parallel graph processing architecture based on Micron's Hybrid Memory Cube (HMC), one of the most prominent 3D-stacked memory technologies. It implements a Pregel-like vertex-centric programming model, so that users could develop programs in the familiar interface while taking advantage of PIM. Despite the orders of magnitude speedup compared to DRAM-based systems, TESSERACT generates excessive crosscube communications through SerDes links, whose bandwidth is much less than the aggregated local bandwidth of HMCs. Our investigation indicates that this is because of the restricted data organization required by the vertex programming model. In this paper, we argue that a PIM-based graph processing system should take data organization as a first-order design consideration. Following this principle, we propose GraphP, a novel HMC-based software/hardware co-designed graph processing system that drastically reduces communication and energy consumption compared to TESSERACT. GraphP features three key techniques. 1) "Source-cut" partitioning, which fundamentally changes the cross-cube communication from one remote put per cross-cube edge to one update per replica. 2) "Two-phase Vertex Program", a programming model designed for the "source-cut" partitioning with two operations: GenUpdate and ApplyUpdate. 3) Hierarchical communication and overlapping, which further improves performance with unique opportunities offered by the proposed partitioning and programming model. We evaluate GraphP using a cycle accurate simulator with 5 real-world graphs and 4 algorithms. The results show that it provides on average 1.7 speedup and 89% energy saving compared to TESSERACT.
Author Gao, Mingyu
Wang, Chao
Chen, Kang
Zhang, Mingxing
Qian, Xuehai
Zhuo, Youwei
Wu, Yongwei
Kozyrakis, Christos
Author_xml – sequence: 1
  givenname: Mingxing
  surname: Zhang
  fullname: Zhang, Mingxing
– sequence: 2
  givenname: Youwei
  surname: Zhuo
  fullname: Zhuo, Youwei
– sequence: 3
  givenname: Chao
  surname: Wang
  fullname: Wang, Chao
– sequence: 4
  givenname: Mingyu
  surname: Gao
  fullname: Gao, Mingyu
– sequence: 5
  givenname: Yongwei
  surname: Wu
  fullname: Wu, Yongwei
– sequence: 6
  givenname: Kang
  surname: Chen
  fullname: Chen, Kang
– sequence: 7
  givenname: Christos
  surname: Kozyrakis
  fullname: Kozyrakis, Christos
– sequence: 8
  givenname: Xuehai
  surname: Qian
  fullname: Qian, Xuehai
BookMark eNotj0FLwzAYhqMouE7PHrzkD7Tm65ekjbfZzW0wsYjCwMNIk9RFXDvSDvHf26mn5_K8LzwROWvaxhFyDSwBYOp2URaTJGWQJ4wxgSckAoG5RCnU-pSMUszyOGW4viBR130MTqoEjMjbPOj9tryjz84ejG_eadHudofGG937tqF1G2i5fIzvdecs_ZVpGVrjuu4of_l-S2d17Y13TU-nute01KH3x_ElOa_1Z-eu_jkmrw-zl2IRr57my2Kyij1koo9NJTgqW2t0prK1sxnn3Eg0xlglwUpjKy4AmeZDGMszxeUA4AB86E1xTG7-fr1zbrMPfqfD9ybHNGMo8QfxmlL9
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/HPCA.2018.00053
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 153863659X
9781538636596
EISSN 2378-203X
EndPage 557
ExternalDocumentID 8327036
Genre orig-research
GroupedDBID 29O
6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
M43
OCL
RIE
RIL
RNS
ID FETCH-LOGICAL-i175t-cb5439dfa3ecbdfed7444c63cccd961d6cdb45130a40530879463081411401823
IEDL.DBID RIE
IngestDate Wed Aug 27 02:51:15 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i175t-cb5439dfa3ecbdfed7444c63cccd961d6cdb45130a40530879463081411401823
PageCount 14
ParticipantIDs ieee_primary_8327036
PublicationCentury 2000
PublicationDate 2018-Feb
PublicationDateYYYYMMDD 2018-02-01
PublicationDate_xml – month: 02
  year: 2018
  text: 2018-Feb
PublicationDecade 2010
PublicationTitle Proceedings - International Symposium on High-Performance Computer Architecture
PublicationTitleAbbrev HPCA
PublicationYear 2018
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0002951
Score 2.5004237
Snippet Processing-In-Memory (PIM) is an effective technique that reduces data movements by integrating processing units within memory. The recent advance of "big...
SourceID ieee
SourceType Publisher
StartPage 544
SubjectTerms Bandwidth
Graph processing
Hybrid Memory Cube
Memory management
Organizations
Partitioning algorithms
Processing In Memory
Programming
Three-dimensional displays
Title GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition
URI https://ieeexplore.ieee.org/document/8327036
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEB7anjxVbcU3e_Dottlk8_KmtbUKlSAWCh7KvgoipEXSi7_emSStDzx4SgibZNndZOab_eYbgAuhJAFVy2OTLBCgOMOV0T5XRLNB2JykhuKQk8doPJUPs3DWgMttLoxzriSfuR6dlnv5dmnWFCrr4-ojvagmNBG4Vbla27-uj65CLd0jvLQ_zgbXRNwipqRHlY-_1U4pTceoDZPNSyvGyFtvXeie-filx_jfXu1C9ytJj2Vb87MHDZfvQ3tTpYHVH20HXu5Ikzq7Yk-k0opN2Y-kEIZeK8vuJ_wG7ZllZWNWpw9QYwrUsmEpNIHdYLeqUCyj9UY3d2E6Gj4PxrwuqcBf0U8ouNEheiB2oQJntF04G0spTRQYY2waCRsZq2WIdk2hI0digamM8CCkICCW-MEBtPJl7g6BCQIvXozPcFrGLkl9Yb0AbV3s2VBH3hF0aKzmq0o1Y14P0_Hfl09gh2ar4kOfQqt4X7szNPeFPi_n-ROEmKnp
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8JAEJ4gHvSECsa3e_BoodtuX94UxaKUNAYSEg-k-yAxJmBMufjrnWkLPuLBU5tm2252t_1mZr_5BuCCZ4IcVW0FKpyhg2KUlSnpWBnRbNBtDiNFcchk6Mdj8TDxJjW4XOfCGGMK8plp02mxl68Xakmhsg6uPtKL2oBNxH2Pl9la6_-ug8ZCJd7D7agTp91rom4RV9Km2sffqqcU4NFrQLJ6bckZeW0vc9lWH78UGf_brx1ofaXpsXQNQLtQM_M9aKzqNLDqs23C8z2pUqdX7Il0WrEp-5EWwtBuZWk_sW4Q0TQrGrMqgYAaU6iW3RVSE9gNdpvlGUtpxdHNLRj37kbd2KqKKlgvaCnklpIe2iB6lrlGST0zOhBCKN9VSunI59pXWgoPkS1DU47kAiPh44ELTq5Y6Lj7UJ8v5uYAGCf3xQ7wGUaKwISRw7XtItoFtvakbx9Ck8Zq-lbqZkyrYTr6-_I5bMWjZDAd9IePx7BNM1eyo0-gnr8vzSmCfy7Pijn_BJIHrTI
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+-+International+Symposium+on+High-Performance+Computer+Architecture&rft.atitle=GraphP%3A+Reducing+Communication+for+PIM-Based+Graph+Processing+with+Efficient+Data+Partition&rft.au=Zhang%2C+Mingxing&rft.au=Zhuo%2C+Youwei&rft.au=Wang%2C+Chao&rft.au=Gao%2C+Mingyu&rft.date=2018-02-01&rft.pub=IEEE&rft.eissn=2378-203X&rft.spage=544&rft.epage=557&rft_id=info:doi/10.1109%2FHPCA.2018.00053&rft.externalDocID=8327036