Accelerating Data Analytics on Integrated GPU Platforms via Runtime Specialization

Integrated GPU systems are a cost-effective and energy-efficient option for accelerating data-intensive applications. While these platforms have reduced overhead of offloading computation to the GPU and potential for fine-grained resource scheduling, there remain several open challenges: (1) the dis...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of parallel programming Vol. 46; no. 2; pp. 336 - 375
Main Authors Farooqui, Naila, Roy, Indrajit, Chen, Yuan, Talwar, Vanish, Barik, Rajkishore, Lewis, Brian, Shpeisman, Tatiana, Schwan, Karsten
Format Journal Article
LanguageEnglish
Published New York Springer US 01.04.2018
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Integrated GPU systems are a cost-effective and energy-efficient option for accelerating data-intensive applications. While these platforms have reduced overhead of offloading computation to the GPU and potential for fine-grained resource scheduling, there remain several open challenges: (1) the distinct execution models inherent in the heterogeneous devices present on such platforms drives the need to dynamically match workload characteristics to the underlying resources, (2) the complex architecture and programming models of such systems require substantial application knowledge to achieve high performance, and (3) as such systems become prevalent, there is a need to extend their utility from running known regular data-parallel applications to the broader set of input-dependent, irregular applications common in enterprise settings. The key contribution of our research is to enable runtime specialization on such integrated GPU platforms by matching application characteristics to the underlying heterogeneous resources for both regular and irregular workloads. Our approach enables profile-driven resource management and optimizations for such platforms, providing high application performance and system throughput. Toward this end, this work proposes two novel schedulers with distinct goals: (a) a device-affinity, contention-aware scheduler that incorporates instrumentation-driven optimizations to improve the throughput of running diverse applications on integrated CPU–GPU servers, and (b) a specialized, affinity-aware work-stealing scheduler that efficiently distributes work across all CPU and GPU cores for the same application, taking into account both application characteristics and architectural differences of the underlying devices.
AbstractList Integrated GPU systems are a cost-effective and energy-efficient option for accelerating data-intensive applications. While these platforms have reduced overhead of offloading computation to the GPU and potential for fine-grained resource scheduling, there remain several open challenges: (1) the distinct execution models inherent in the heterogeneous devices present on such platforms drives the need to dynamically match workload characteristics to the underlying resources, (2) the complex architecture and programming models of such systems require substantial application knowledge to achieve high performance, and (3) as such systems become prevalent, there is a need to extend their utility from running known regular data-parallel applications to the broader set of input-dependent, irregular applications common in enterprise settings. The key contribution of our research is to enable runtime specialization on such integrated GPU platforms by matching application characteristics to the underlying heterogeneous resources for both regular and irregular workloads. Our approach enables profile-driven resource management and optimizations for such platforms, providing high application performance and system throughput. Toward this end, this work proposes two novel schedulers with distinct goals: (a) a device-affinity, contention-aware scheduler that incorporates instrumentation-driven optimizations to improve the throughput of running diverse applications on integrated CPU–GPU servers, and (b) a specialized, affinity-aware work-stealing scheduler that efficiently distributes work across all CPU and GPU cores for the same application, taking into account both application characteristics and architectural differences of the underlying devices.
Integrated GPU systems are a cost-effective and energy-efficient option for accelerating data-intensive applications. While these platforms have reduced overhead of offloading computation to the GPU and potential for fine-grained resource scheduling, there remain several open challenges: (1) the distinct execution models inherent in the heterogeneous devices present on such platforms drives the need to dynamically match workload characteristics to the underlying resources, (2) the complex architecture and programming models of such systems require substantial application knowledge to achieve high performance, and (3) as such systems become prevalent, there is a need to extend their utility from running known regular data-parallel applications to the broader set of input-dependent, irregular applications common in enterprise settings. The key contribution of our research is to enable runtime specialization on such integrated GPU platforms by matching application characteristics to the underlying heterogeneous resources for both regular and irregular workloads. Our approach enables profile-driven resource management and optimizations for such platforms, providing high application performance and system throughput. Toward this end, this work proposes two novel schedulers with distinct goals: (a) a device-affinity, contention-aware scheduler that incorporates instrumentation-driven optimizations to improve the throughput of running diverse applications on integrated CPU–GPU servers, and (b) a specialized, affinity-aware work-stealing scheduler that efficiently distributes work across all CPU and GPU cores for the same application, taking into account both application characteristics and architectural differences of the underlying devices.
Author Barik, Rajkishore
Lewis, Brian
Shpeisman, Tatiana
Roy, Indrajit
Chen, Yuan
Talwar, Vanish
Farooqui, Naila
Schwan, Karsten
Author_xml – sequence: 1
  givenname: Naila
  orcidid: 0000-0001-6592-5328
  surname: Farooqui
  fullname: Farooqui, Naila
  email: naila.farooqui@intel.com
  organization: Intel Labs
– sequence: 2
  givenname: Indrajit
  surname: Roy
  fullname: Roy, Indrajit
  organization: Hewlett Packard Labs
– sequence: 3
  givenname: Yuan
  surname: Chen
  fullname: Chen, Yuan
  organization: Hewlett Packard Labs
– sequence: 4
  givenname: Vanish
  surname: Talwar
  fullname: Talwar, Vanish
  organization: PernixData, Inc
– sequence: 5
  givenname: Rajkishore
  surname: Barik
  fullname: Barik, Rajkishore
  organization: Intel Labs
– sequence: 6
  givenname: Brian
  surname: Lewis
  fullname: Lewis, Brian
  organization: Intel Labs
– sequence: 7
  givenname: Tatiana
  surname: Shpeisman
  fullname: Shpeisman, Tatiana
  organization: Intel Labs
– sequence: 8
  givenname: Karsten
  surname: Schwan
  fullname: Schwan, Karsten
  organization: Georgia Institute of Technology
BookMark eNp9kE1LAzEQhoNUsK3-AG8Bz6uT7EfSY6laCwVLteeQzWZLyjZbk1Raf72pKwiCHmbmMPPMzPsOUM-2ViN0TeCWALA7T4AVRQIkRsZpcjhDfZKzNGFFBj3UB87zhGU5v0AD7zcAMGKc99FyrJRutJPB2DW-l0HisZXNMRjlcWvxzAa9jl1d4elihReNDHXrth6_G4mXexvMVuOXnVZGNuYjbmntJTqvZeP11XcdotXjw-vkKZk_T2eT8TxRKSlCImOuMsjTmtKMVpxQkpUSKNcKeFkD0GJESMVqUCQtlSyLVDHN07KWigKV6RDddHt3rn3bax_Ept27-LwXFKJ2yHLC4hTpppRrvXe6FjtnttIdBQFxsk501olonThZJw6RYb8YZcKXtuCkaf4laUf6eMWutfv56W_oE_YDhSw
CitedBy_id crossref_primary_10_54365_adyumbd_1508182
crossref_primary_10_1002_cpe_8318
Cites_doi 10.1109/ISPASS.2012.6189206
10.1145/2442516.2442524
10.1145/1837274.1837289
10.1145/2581122.2544166
10.1145/2628071.2628088
10.1109/IPDPS.2012.23
10.1007/978-3-540-92990-1_4
10.1145/1073970.1073974
10.1145/341800.341803
10.1109/IISWC.2012.6402918
10.1145/1941553.1941590
10.1145/2525526.2525847
10.1109/IISWC.2009.5306797
10.1109/MASCOTS.2010.43
10.1002/cpe.1631
10.1109/IISWC.2009.5306801
10.1145/1810085.1810106
10.1109/ISPASS.2010.5452029
10.1145/1693453.1693470
10.1007/978-3-642-36036-7_14
10.1145/2541940.2541971
10.1145/1772954.1772992
10.1109/IPDPS.2009.5161068
10.1145/324133.324234
10.1109/ISPASS.2009.4919648
10.1145/2248418.2248428
10.1145/2541940.2541963
10.1109/IPDPS.2013.101
10.1145/2517349.2522739
10.1145/1229428.1229448
10.1145/2544137.2544163
10.1145/2043556.2043579
10.1145/2581122.2544165
10.1145/2384616.2384639
10.1109/CCGrid.2012.78
10.1145/2503210.2503302
10.1145/1693453.1693504
10.1109/IPDPS.2010.5470413
10.1109/IISWC.2010.5649549
10.1145/2482767.2482794
10.1145/1996130.1996160
10.14778/2212351.2212354
10.1145/1669112.1669121
10.1109/CGO.2013.6494993
10.1109/HPCA.2011.5749745
10.1145/1941553.1941591
10.1145/2517349.2522715
10.1145/2287076.2287091
10.1145/2287076.2287090
ContentType Journal Article
Copyright Springer Science+Business Media New York 2016
International Journal of Parallel Programming is a copyright of Springer, (2016). All Rights Reserved.
Copyright_xml – notice: Springer Science+Business Media New York 2016
– notice: International Journal of Parallel Programming is a copyright of Springer, (2016). All Rights Reserved.
DBID AAYXX
CITATION
3V.
7SC
7WY
7WZ
7XB
87Z
8AL
8FD
8FE
8FG
8FK
8FL
8G5
ABUWG
AFKRA
ARAPS
AZQEC
BENPR
BEZIV
BGLVJ
CCPQU
DWQXO
FRNLG
F~G
GNUQQ
GUQSH
HCIFZ
JQ2
K60
K6~
K7-
L.-
L7M
L~C
L~D
M0C
M0N
M2O
MBDVC
P5Z
P62
PHGZM
PHGZT
PKEHL
PQBIZ
PQBZA
PQEST
PQGLB
PQQKQ
PQUKI
Q9U
DOI 10.1007/s10766-016-0482-x
DatabaseName CrossRef
ProQuest Central (Corporate)
Computer and Information Systems Abstracts
ABI/INFORM Collection
ABI/INFORM Global (PDF only)
ProQuest Central (purchase pre-March 2016)
ABI/INFORM Collection
Computing Database (Alumni Edition)
Technology Research Database
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central (Alumni) (purchase pre-March 2016)
ABI/INFORM Collection (Alumni Edition)
ProQuest Research Library
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
Health Research Premium Collection
ProQuest Central Essentials
ProQuest Central
Business Premium Collection
Technology Collection
ProQuest One
ProQuest Central Korea
Business Premium Collection (Alumni)
ABI/INFORM Global (Corporate)
ProQuest Central Student
ProQuest Research Library
SciTech Premium Collection
ProQuest Computer Science Collection
ProQuest Business Collection (Alumni Edition)
ProQuest Business Collection
Computer Science Database
ABI/INFORM Professional Advanced
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
ABI/INFORM Global
Computing Database
Research Library
Research Library (Corporate)
Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Premium
ProQuest One Academic
ProQuest One Academic Middle East (New)
ProQuest One Business (OCUL)
ProQuest One Business (Alumni)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central Basic
DatabaseTitle CrossRef
ABI/INFORM Global (Corporate)
ProQuest Business Collection (Alumni Edition)
ProQuest One Business
Research Library Prep
Computer Science Database
ProQuest Central Student
Technology Collection
Technology Research Database
Computer and Information Systems Abstracts – Academic
ProQuest One Academic Middle East (New)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
Research Library (Alumni Edition)
ABI/INFORM Complete
ProQuest Central
ABI/INFORM Professional Advanced
ProQuest One Applied & Life Sciences
ProQuest Central Korea
ProQuest Research Library
ProQuest Central (New)
Advanced Technologies Database with Aerospace
ABI/INFORM Complete (Alumni Edition)
Advanced Technologies & Aerospace Collection
Business Premium Collection
ABI/INFORM Global
ProQuest Computing
ABI/INFORM Global (Alumni Edition)
ProQuest Central Basic
ProQuest Computing (Alumni Edition)
ProQuest One Academic Eastern Edition
ProQuest Technology Collection
ProQuest SciTech Collection
ProQuest Business Collection
Computer and Information Systems Abstracts Professional
Advanced Technologies & Aerospace Database
ProQuest One Academic UKI Edition
ProQuest One Business (Alumni)
ProQuest One Academic
ProQuest One Academic (New)
ProQuest Central (Alumni)
Business Premium Collection (Alumni)
DatabaseTitleList ABI/INFORM Global (Corporate)

Database_xml – sequence: 1
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1573-7640
EndPage 375
ExternalDocumentID 10_1007_s10766_016_0482_x
GroupedDBID -4Z
-59
-5G
-BR
-EM
-Y2
-~C
-~X
.4S
.86
.DC
.VR
06D
0R~
0VY
199
1N0
2.D
203
28-
29J
2J2
2JN
2JY
2KG
2LR
2P1
2VQ
2~H
30V
3V.
4.4
406
408
409
40D
40E
5GY
5QI
5VS
67Z
6NX
78A
7WY
8FE
8FG
8FL
8G5
8TC
8UJ
95-
95.
95~
96X
AAAVM
AABHQ
AACDK
AAHNG
AAIAL
AAJBT
AAJKR
AANZL
AAOBN
AARHV
AARTL
AASML
AATNV
AATVU
AAUYE
AAWCG
AAYIU
AAYJJ
AAYQN
AAYTO
AAYZH
ABAKF
ABBBX
ABBXA
ABDBF
ABDPE
ABDZT
ABECU
ABFSI
ABFTD
ABFTV
ABHLI
ABHQN
ABJNI
ABJOX
ABKCH
ABKTR
ABMNI
ABMQK
ABNWP
ABQBU
ABQSL
ABSXP
ABTAH
ABTEG
ABTHY
ABTKH
ABTMW
ABULA
ABUWG
ABWNU
ABXPI
ACAOD
ACBXY
ACDTI
ACGFO
ACGFS
ACHSB
ACHXU
ACIHN
ACKNC
ACMDZ
ACMLO
ACNCT
ACOKC
ACOMO
ACPIV
ACREN
ACUHS
ACZOJ
ADHIR
ADINQ
ADKNI
ADKPE
ADMLS
ADRFC
ADTPH
ADURQ
ADYFF
ADYOE
ADZKW
AEAQA
AEBTG
AEFIE
AEFQL
AEGAL
AEGNC
AEJHL
AEJRE
AEKMD
AEMSY
AENEX
AEOHA
AEPYU
AESKC
AETLH
AEVLU
AEXYK
AFBBN
AFEXP
AFGCZ
AFKRA
AFLOW
AFQWF
AFWTZ
AFYQB
AFZKB
AGAYW
AGDGC
AGGDS
AGJBK
AGMZJ
AGQEE
AGQMX
AGRTI
AGWIL
AGWZB
AGYKE
AHAVH
AHBYD
AHKAY
AHSBF
AHYZX
AIAKS
AIGIU
AIIXL
AILAN
AITGF
AJBLW
AJRNO
AJZVZ
ALMA_UNASSIGNED_HOLDINGS
ALWAN
AMKLP
AMTXH
AMXSW
AMYLF
AOCGG
ARAPS
ARCSS
ARMRJ
AXYYD
AYJHY
AZFZN
AZQEC
B-.
B0M
BA0
BBWZM
BDATZ
BENPR
BEZIV
BGLVJ
BGNMA
BKOMP
BPHCQ
BSONS
CAG
CCPQU
COF
CS3
CSCUP
DDRTE
DL5
DNIVK
DPUIP
DU5
DWQXO
E.L
EAD
EAP
EAS
EBLON
EBS
EDO
EIOEI
EJD
EMK
EPL
ESBYG
ESX
FEDTE
FERAY
FFXSO
FIGPU
FINBP
FNLPD
FRNLG
FRRFC
FSGXE
FWDCC
GGCAI
GGRSB
GJIRD
GNUQQ
GNWQR
GQ6
GQ7
GQ8
GROUPED_ABI_INFORM_COMPLETE
GROUPED_ABI_INFORM_RESEARCH
GUQSH
GXS
H13
HCIFZ
HF~
HG5
HG6
HMJXF
HQYDN
HRMNR
HVGLF
HZ~
H~9
I-F
I09
IHE
IJ-
IKXTQ
ITM
IWAJR
IXC
IZIGR
IZQ
I~X
I~Z
J-C
J0Z
JBSCW
JCJTX
JZLTJ
K60
K6V
K6~
K7-
KDC
KOV
KOW
LAK
LLZTM
M0C
M0N
M2O
M4Y
MA-
MS~
N2Q
NB0
NDZJH
NPVJJ
NQJWS
NU0
O9-
O93
O9G
O9I
O9J
OAM
OVD
P19
P62
P9O
PF0
PQBIZ
PQBZA
PQQKQ
PROAC
PT4
PT5
Q2X
QOK
QOS
R89
R9I
RHV
RNI
RNS
ROL
RPX
RSV
RZC
RZE
RZK
S16
S1Z
S26
S27
S28
S3B
SAP
SCJ
SCLPG
SCO
SDH
SDM
SHX
SISQX
SJYHP
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
SZN
T13
T16
TAE
TEORI
TN5
TSG
TSK
TSV
TUC
TUS
U2A
U5U
UG4
UOJIU
UTJUX
UZXMN
VC2
VFIZW
VXZ
W23
W48
WH7
WK8
YLTOR
Z45
Z7R
Z7X
Z81
Z83
Z88
Z8R
Z8W
Z92
ZMTXR
ZY4
~8M
~EX
AAPKM
AAYXX
ABBRH
ABDBE
ABFSG
ACMFV
ACSTC
ADHKG
AEZWR
AFDZB
AFHIU
AFOHR
AGQPQ
AHPBZ
AHWEU
AIXLP
ATHPR
AYFIA
CITATION
PHGZM
PHGZT
7SC
7XB
8AL
8FD
8FK
ABRTQ
JQ2
L.-
L7M
L~C
L~D
MBDVC
PKEHL
PQEST
PQGLB
PQUKI
Q9U
ID FETCH-LOGICAL-c316t-a316d4053f2242d81214ba028ec08bf0026911d7f0c13bcab63c7e83bfac202a3
IEDL.DBID BENPR
ISSN 0885-7458
IngestDate Fri Jul 25 23:30:11 EDT 2025
Tue Jul 01 00:50:31 EDT 2025
Thu Apr 24 22:59:14 EDT 2025
Fri Feb 21 02:37:21 EST 2025
IsPeerReviewed true
IsScholarly true
Issue 2
Keywords Scheduling
Resource management
GPU
Dynamic instrumentation
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c316t-a316d4053f2242d81214ba028ec08bf0026911d7f0c13bcab63c7e83bfac202a3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0001-6592-5328
PQID 2015704517
PQPubID 48389
PageCount 40
ParticipantIDs proquest_journals_2015704517
crossref_primary_10_1007_s10766_016_0482_x
crossref_citationtrail_10_1007_s10766_016_0482_x
springer_journals_10_1007_s10766_016_0482_x
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 20180400
2018-4-00
20180401
PublicationDateYYYYMMDD 2018-04-01
PublicationDate_xml – month: 4
  year: 2018
  text: 20180400
PublicationDecade 2010
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle International journal of parallel programming
PublicationTitleAbbrev Int J Parallel Prog
PublicationYear 2018
Publisher Springer US
Springer Nature B.V
Publisher_xml – name: Springer US
– name: Springer Nature B.V
References Rossbach, C.J., Currey, J., Silberstein, M., Ray, B., Witchel, E.: Ptask: operating system abstractions to manage gpus as compute devices. In: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles. SOSP ’11, pp. 233–248. ACM, New York (2011)
Becchi, M., Sajjapongse, K., Graves, I., Procter, A., Ravi, V., Chakradhar, S.: A virtual memory based runtime to support multi-tenancy in clusters with gpus. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, HPDC ’12, pp. 97–108. ACM, New York, NY, USA (2012). doi:10.1145/2287076.2287090
Lê, N.M., Pop, A., Cohen, A., Zappa Nardelli, F.: Correct and efficient work-stealing for weak memory models. In: Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’13, pp. 69–80. ACM, New York (2013). doi:10.1145/2442516.2442524
BlumofeRDLeisersonCEScheduling multithreaded computations by work stealingJ. ACM1999465720748174765310.1145/324133.3242341065.68504
Boyer, M., Skadron, K., Che, S., Jayasena, N.: Load balancing in a changing world: dealing with heterogeneity and performance variability. In: Proceedings of the ACM International Conference on Computing Frontiers, CF ’13, pp. 21:1–21:10. ACM, New York (2013)
Kaleem, R., Barik, R., Shpeisman, T., Lewis, B.T., Hu, C., Pingali, K.: Adaptive heterogeneous scheduling for integrated gpus. In: Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, PACT ’14, pp. 151–162. ACM, New York (2014). doi:10.1145/2628071.2628088
Chen, L., Villa, O., Krishnamoorthy, S., Gao, G.: Dynamic load balancing on single- and multi-GPU systems. In: IEEE International Symposium on Parallel Distributed Processing (IPDPS), pp. 1–12 (2010). doi:10.1109/IPDPS.2010.5470413
Kim, J., Kim, H., Lee, J.H., Lee, J.: Achieving a single compute device image in OpenCL for multiple GPUs. In: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming, PPoPP ’11, pp. 277–288. ACM, NY, USA (2011). doi:10.1145/1941553.1941591
Grewe, D., Wang, Z., O’Boyle, M.F.P.: Portable mapping of data parallel programs to opencl for heterogeneous systems. In: IEEE Computer Society CGO, pp. 22:1–22:10 (2013). http://dblp.uni-trier.de/db/conf/cgo/cgo2013.html#GreweWO13
Rossbach, C.J., Yu, Y., Currey, J., Martin, J.P., Fetterly, D.: Dandelion: a compiler and runtime for heterogeneous systems. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. SOSP ’13, pp. 49–68. ACM, New York (2013)
Wu, H., Diamos, G., Sheard, T., Aref, M., Baxter, S., Garland, M., Yalamanchili, S.: Red fox: an execution environment for relational query processing on gpus. In: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO ’14, pp. 44:44–44:54. ACM, New York (2014)
Goswami, N., Shankar, R., Joshi, M., Li, T.: Exploring gpgpu workloads: Characterization methodology, analysis and microarchitecture evaluation implications. In: 2010 IEEE International Symposium on Workload Characterization (IISWC), pp. 1–10 (2010). doi:10.1109/IISWC.2010.5649549
Kumar, V., Frampton, D., Blackburn, S.M., Grove, D., Tardieu, O.: Work-stealing without the baggage. In: Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA ’12, pp. 297–314. ACM, New York (2012). doi:10.1145/2384616.2384639
Ravi, V.T., Ma, W., Chiu, D., Agrawal, G.: Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations. In: Proceedings of the 24th ACM International Conference on Supercomputing, ICS ’10, pp. 137–146. ACM, New York (2010). doi:10.1145/1810085.1810106
LowYBicksonDGonzalezJGuestrinCKyrolaAHellersteinJMDistributed graphlab: a framework for machine learning and data mining in the cloudProc. VLDB Endow.20125871672710.14778/2212351.2212354
Chase, D., Lev, Y.: Dynamic circular work-stealing deque. In: Proceedings of the Seventeenth Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’05, pp. 21–28. ACM, New York (2005). doi:10.1145/1073970.1073974
Kerr, A., Diamos, G., Yalamanchili, S.: A characterization and analysis of ptx kernels. In: IEEE International Symposium on Workload Characterization, 2009. IISWC 2009, pp. 3–12 (2009). doi:10.1109/IISWC.2009.5306801
Schaa, D., Kaeli, D.: Exploring the multiple-GPU design space. In: IEEE International Symposium on Parallel Distributed Processing. IPDPS., pp. 1–12 (2009). doi:10.1109/IPDPS.2009.5161068
Kato, S., McThrow, M., Maltzahn, C., Brandt, S.: Gdev: First-class gpu resource management in the operating system. In: Proceedings of the 2012 USENIX Conference on Annual Technical Conference. USENIX ATC’12, pp. 37–37. USENIX Association, Berkeley, CA, USA (2012)
Li, D., Becchi, M.: Deploying graph algorithms on gpus: an adaptive solution. In: 2013 IEEE 27th International Symposium on Parallel Distributed Processing (IPDPS), pp. 1013–1024 (2013). doi:10.1109/IPDPS.2013.101
Sbîrlea, A., Zou, Y., Budimlíc, Z., Cong, J., Sarkar, V.: Mapping a data-flow programming model onto heterogeneous platforms. In: Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems, LCTES ’12, pp. 61–70. ACM, New York (2012). doi:10.1145/2248418.2248428
Menychtas, K., Shen, K., Scott, M.L.: Disengaged scheduling for fair, protected access to fast computational accelerators. In: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’14, pp. 301–316. ACM, New York (2014). doi:10.1145/2541940.2541963
Grewe, D., Wang, Z., O’Boyle, M.: Portable mapping of data parallel programs to OpenCL for heterogeneous systems. In: IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 1–10 (2013). doi:10.1109/CGO.2013.6494993
Lee, J., Samadi, M., Park, Y., Mahlke, S.: Transparent CPU–GPU collaboration for data-parallel kernels on heterogeneous systems. In: Proceedings of the 22nd international conference on Parallel architectures and compilation techniques, PACT (2013)
Group, K.O.W.: The OpenCL Specification (2008). http://www.khronos.org/registry/cl/specs/opencl-1.0.29.pdf
AMD: CodeXL. AMD, 3.1 edn
Gupta, V., Schwan, K., Tolia, N., Talwar, V., Ranganathan, P.: Pegasus: Coordinated scheduling for virtualized accelerator-based systems. In: Proceedings of the 2011 Usenix Annual Technical Conference, Portland, USA (2011)
Ariel, A., Fung, W.W.L., Turner, A.E., Aamodt, T.M.: Visualizing complex dynamics in many-core accelerator architectures. In: IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 164–174. IEEE Computer Society, White Plains, NY, USA (2010)
Jiménez, V.J., Vilanova, L., Gelado, I., Gil, M., Fursin, G., Navarro, N.: Predictive runtime code scheduling for heterogeneous architectures. In: Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers, HiPEAC ’09, pp. 19–33. Springer, Berlin, Heidelberg (2009). doi:10.1007/978-3-540-92990-1_4
Barik, R., Kaleem, R., Majeti, D., Lewis, B.T., Shpeisman, T., Hu, C., Ni, Y., Adl-Tabatabai, A.R.: Efficient mapping of irregular c++ applications to integrated gpus. In: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO ’14, pp. 33:33–33:43. ACM, New York, NY, USA (2014)
Zhang, Y., Owens, J.D.: A quantitative performance analysis model for gpu architectures. In: 17th International Conference on High-Performance Computer Architecture (HPCA-17), pp. 382–393. IEEE Computer Society, San Antonio, TX, USA (2011)
Nguyen, D., Lenharth, A., Pingali, K.: A lightweight infrastructure for graph analytics. Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. SOSP ’13, pp. 456–471. ACM, New York (2013)
Nilakant, K., Yoneki, E.: On the efficacy of apus for heterogeneous graph computation. In: Fourth Workshop on Systems for Future Multicore Architectures (2014)
NVIDIA: NVIDIA CUDA Tools SDK CUPTI. NVIDIA Corporation, Santa Clara, California, 1.0 edn. (2011)
AugonnetCThibaultSNamystRWacrenierPAStarpu: a unified platform for task scheduling on heterogeneous multicore architecturesConcurr Comput Pract Exp201123218719810.1002/cpe.1631
Luk, C.K., Hong, S., Kim, H.: Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 42, pp. 45–55. ACM, New York (2009). doi:10.1145/1669112.1669121
Baghsorkhi, S.S., Delahaye, M., Patel, S.J., Gropp, W.D., Hwu, W.M.W.: An adaptive performance modeling tool for gpu architectures. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’10, pp. 105–114. ACM, New York (2010). doi:10.1145/1693453.1693470
Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J., Lee, S.H., Skadron, K.: Rodinia: A benchmark suite for heterogeneous computing. In: IEEE International Symposium on Workload Characterization, 2009. IISWC 2009, pp. 44–54 (2009). doi:10.1109/IISWC.2009.5306797
Lee, K., Liu, L.: Efficient data partitioning model for heterogeneous graphs in the cloud. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC ’13, pp. 46:1–46:12. ACM, New York (2013)
Ravi, V.T., Becchi, M., Agrawal, G., Chakradhar, S.: Supporting gpu sharing in cloud environments with a transparent runtime consolidation framework. In: Proceedings of the 20th international symposium on High performance distributed computing, HPDC ’11, pp. 217–228. ACM, New York (2011). doi:10.1145/1996130.1996160
Kim, S., Roy, I., Talwar, V.: Evaluating integrated graphics processors for data center workloads. In: Proceedings of the Workshop on Power-Aware Computing and Systems, HotPower ’13, pp. 8:1–8:5. ACM, New York, NY, USA
482_CR17
482_CR18
482_CR15
482_CR59
482_CR16
482_CR57
482_CR14
482_CR58
482_CR11
482_CR55
482_CR12
482_CR56
482_CR19
482_CR20
482_CR64
482_CR21
482_CR65
482_CR62
482_CR63
482_CR60
482_CR61
RD Blumofe (482_CR13) 1999; 46
482_CR28
482_CR29
482_CR26
482_CR27
482_CR24
482_CR25
482_CR22
482_CR66
482_CR23
482_CR31
482_CR32
482_CR30
482_CR39
482_CR37
C Augonnet (482_CR7) 2011; 23
482_CR38
482_CR35
482_CR36
482_CR33
482_CR34
482_CR1
482_CR2
482_CR3
482_CR4
482_CR5
482_CR6
482_CR8
482_CR42
482_CR9
482_CR43
482_CR40
482_CR41
Y Low (482_CR44) 2012; 5
482_CR48
482_CR49
482_CR46
482_CR47
482_CR45
482_CR53
482_CR10
482_CR54
482_CR51
482_CR52
482_CR50
References_xml – reference: Barik, R., Kaleem, R., Majeti, D., Lewis, B.T., Shpeisman, T., Hu, C., Ni, Y., Adl-Tabatabai, A.R.: Efficient mapping of irregular c++ applications to integrated gpus. In: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO ’14, pp. 33:33–33:43. ACM, New York, NY, USA (2014)
– reference: Ravi, V.T., Becchi, M., Jiang, W., Agrawal, G., Chakradhar, S.: Scheduling concurrent applications on a cluster of cpu–gpu nodes. In: Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster. Cloud and Grid Computing (ccgrid 2012), CCGRID ’12, pp. 140–147. IEEE Computer Society, Washington (2012)
– reference: Kumar, V., Frampton, D., Blackburn, S.M., Grove, D., Tardieu, O.: Work-stealing without the baggage. In: Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA ’12, pp. 297–314. ACM, New York (2012). doi:10.1145/2384616.2384639
– reference: Kerr, A., Diamos, G., Yalamanchili, S.: A characterization and analysis of ptx kernels. In: IEEE International Symposium on Workload Characterization, 2009. IISWC 2009, pp. 3–12 (2009). doi:10.1109/IISWC.2009.5306801
– reference: Guo, Y., Zhao, J., Cave, V., Sarkar, V.: Slaw: A scalable locality-aware adaptive work-stealing scheduler for multi-core systems. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’10, pp. 341–342. ACM, New York (2010). doi:10.1145/1693453.1693504
– reference: AMD: CodeXL. AMD, 3.1 edn
– reference: Li, D., Becchi, M.: Deploying graph algorithms on gpus: an adaptive solution. In: 2013 IEEE 27th International Symposium on Parallel Distributed Processing (IPDPS), pp. 1013–1024 (2013). doi:10.1109/IPDPS.2013.101
– reference: BlumofeRDLeisersonCEScheduling multithreaded computations by work stealingJ. ACM1999465720748174765310.1145/324133.3242341065.68504
– reference: Boyer, M., Skadron, K., Che, S., Jayasena, N.: Load balancing in a changing world: dealing with heterogeneity and performance variability. In: Proceedings of the ACM International Conference on Computing Frontiers, CF ’13, pp. 21:1–21:10. ACM, New York (2013)
– reference: Rossbach, C.J., Yu, Y., Currey, J., Martin, J.P., Fetterly, D.: Dandelion: a compiler and runtime for heterogeneous systems. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. SOSP ’13, pp. 49–68. ACM, New York (2013)
– reference: Schaa, D., Kaeli, D.: Exploring the multiple-GPU design space. In: IEEE International Symposium on Parallel Distributed Processing. IPDPS., pp. 1–12 (2009). doi:10.1109/IPDPS.2009.5161068
– reference: Luk, C.K., Hong, S., Kim, H.: Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 42, pp. 45–55. ACM, New York (2009). doi:10.1145/1669112.1669121
– reference: Group, K.O.W.: The OpenCL Specification (2008). http://www.khronos.org/registry/cl/specs/opencl-1.0.29.pdf
– reference: Jiménez, V.J., Vilanova, L., Gelado, I., Gil, M., Fursin, G., Navarro, N.: Predictive runtime code scheduling for heterogeneous architectures. In: Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers, HiPEAC ’09, pp. 19–33. Springer, Berlin, Heidelberg (2009). doi:10.1007/978-3-540-92990-1_4
– reference: Lê, N.M., Pop, A., Cohen, A., Zappa Nardelli, F.: Correct and efficient work-stealing for weak memory models. In: Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’13, pp. 69–80. ACM, New York (2013). doi:10.1145/2442516.2442524
– reference: Agrawal, K., He, Y., Leiserson, C.E.: Adaptive work stealing with parallelism feedback. In: Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’07, pp. 112–120. ACM, New York (2007). doi:10.1145/1229428.1229448
– reference: Chase, D., Lev, Y.: Dynamic circular work-stealing deque. In: Proceedings of the Seventeenth Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’05, pp. 21–28. ACM, New York (2005). doi:10.1145/1073970.1073974
– reference: Grewe, D., Wang, Z., O’Boyle, M.F.P.: Portable mapping of data parallel programs to opencl for heterogeneous systems. In: IEEE Computer Society CGO, pp. 22:1–22:10 (2013). http://dblp.uni-trier.de/db/conf/cgo/cgo2013.html#GreweWO13
– reference: Compute architecture of intel processor graphics. https://software.intel.com/en-us/file/compute-architecture-of-intel-processor-graphics-gen8pdf
– reference: Nilakant, K., Yoneki, E.: On the efficacy of apus for heterogeneous graph computation. In: Fourth Workshop on Systems for Future Multicore Architectures (2014)
– reference: Gautier, T., Ferreira Lima, J.V., Maillard, N., Raffin, B.: Locality-aware work stealing on multi-CPU and multi-GPU architectures. In: 6th Workshop on Programmability Issues for Heterogeneous Multicores (MULTIPROG). Berlin, Germany (2013). https://hal.inria.fr/hal-00780890
– reference: Kato, S., Lakshmanan, K., Rajkumar, R., Ishikawa, Y.: Timegraph: Gpu scheduling for real-time multi-tasking environments. In: Proceedings of the 2011 USENIX Conference on USENIX Annual Technical Conference. USENIXATC’11, pp. 2–2. USENIX Association, Berkeley, CA, USA (2011)
– reference: Ravi, V.T., Becchi, M., Agrawal, G., Chakradhar, S.: Supporting gpu sharing in cloud environments with a transparent runtime consolidation framework. In: Proceedings of the 20th international symposium on High performance distributed computing, HPDC ’11, pp. 217–228. ACM, New York (2011). doi:10.1145/1996130.1996160
– reference: Kim, S., Roy, I., Talwar, V.: Evaluating integrated graphics processors for data center workloads. In: Proceedings of the Workshop on Power-Aware Computing and Systems, HotPower ’13, pp. 8:1–8:5. ACM, New York, NY, USA (2013)
– reference: AugonnetCThibaultSNamystRWacrenierPAStarpu: a unified platform for task scheduling on heterogeneous multicore architecturesConcurr Comput Pract Exp201123218719810.1002/cpe.1631
– reference: IMPACT: The parboil benchmark suite (2007). http://www.crhc.uiuc.edu/IMPACT/parboil.php
– reference: Luo, L., Wong, M., Hwu, W.M.: An effective gpu implementation of breadth-first search. In: Proceedings of the 47th design automation conference, pp. 52–55. ACM (2010)
– reference: Becchi, M., Sajjapongse, K., Graves, I., Procter, A., Ravi, V., Chakradhar, S.: A virtual memory based runtime to support multi-tenancy in clusters with gpus. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, HPDC ’12, pp. 97–108. ACM, New York, NY, USA (2012). doi:10.1145/2287076.2287090
– reference: LowYBicksonDGonzalezJGuestrinCKyrolaAHellersteinJMDistributed graphlab: a framework for machine learning and data mining in the cloudProc. VLDB Endow.20125871672710.14778/2212351.2212354
– reference: Farooqui, N., Kerr, A., Eisenhauer, G., Schwan, K., Yalamanchili, S.: Lynx: A dynamic instrumentation system for data-parallel applications on gpgpu architectures. In: 2012 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 58 –67 (2012). doi:10.1109/ISPASS.2012.6189206
– reference: Kaleem, R., Barik, R., Shpeisman, T., Lewis, B.T., Hu, C., Pingali, K.: Adaptive heterogeneous scheduling for integrated gpus. In: Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, PACT ’14, pp. 151–162. ACM, New York (2014). doi:10.1145/2628071.2628088
– reference: Wang, L., Cui, H., Duan, Y., Lu, F., Feng, X., Yew, P.C.: An adaptive task creation strategy for work-stealing scheduling. In: Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO ’10, pp. 266–277. ACM, New York (2010). doi:10.1145/1772954.1772992
– reference: Ribic, H., Liu, Y.D.: Energy-efficient work-stealing language runtimes. In: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’14, pp. 513–528. ACM, New York (2014). doi:10.1145/2541940.2541971
– reference: Baghsorkhi, S.S., Delahaye, M., Patel, S.J., Gropp, W.D., Hwu, W.M.W.: An adaptive performance modeling tool for gpu architectures. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’10, pp. 105–114. ACM, New York (2010). doi:10.1145/1693453.1693470
– reference: Bender, M.A., Rabin, M.O.: Scheduling cilk multithreaded parallel programs on processors of different speeds. In: Proceedings of the Twelfth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA ’00, pp. 13–21. ACM, New York (2000). doi:10.1145/341800.341803
– reference: Chatterjee, S., Grossman, M., Sbîrlea, A.S., Sarkar, V.: Dynamic task parallelism with a GPU work-stealing runtime system. In: Rajopadhye, S.V., Strout, M.M. (eds.) Languages and Compilers for Parallel Computing, 24th International Workshop, LCPC 2011, Fort Collins, CO, USA, September 8–10, 2011. Revised Selected Papers, Lecture Notes in Computer Science, vol. 7146, pp. 203–217. Springer (2011). doi:10.1007/978-3-642-36036-7_14
– reference: Gupta, V., Schwan, K., Tolia, N., Talwar, V., Ranganathan, P.: Pegasus: Coordinated scheduling for virtualized accelerator-based systems. In: Proceedings of the 2011 Usenix Annual Technical Conference, Portland, USA (2011)
– reference: Intel thread building blocks. www.threadbuildingblocks.org
– reference: Phull, R., Li, C.H., Rao, K., Cadambi, H., Chakradhar, S.: Interference-driven resource management for gpu-based heterogeneous clusters. In: Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing, HPDC ’12, pp. 109–120. ACM, New York (2012). doi:10.1145/2287076.2287091
– reference: Goswami, N., Shankar, R., Joshi, M., Li, T.: Exploring gpgpu workloads: Characterization methodology, analysis and microarchitecture evaluation implications. In: 2010 IEEE International Symposium on Workload Characterization (IISWC), pp. 1–10 (2010). doi:10.1109/IISWC.2010.5649549
– reference: Kim, J., Kim, H., Lee, J.H., Lee, J.: Achieving a single compute device image in OpenCL for multiple GPUs. In: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming, PPoPP ’11, pp. 277–288. ACM, NY, USA (2011). doi:10.1145/1941553.1941591
– reference: Min, S.J., Iancu, C., Yelick, K.: Hierarchical work stealing on manycore clusters. In: In Fifth Conference on Partitioned Global Address Space Programming Models (2011)
– reference: Rossbach, C.J., Currey, J., Silberstein, M., Ray, B., Witchel, E.: Ptask: operating system abstractions to manage gpus as compute devices. In: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles. SOSP ’11, pp. 233–248. ACM, New York (2011)
– reference: Pandit, P., Govindarajan, R.: Fluidic kernels: cooperative execution of opencl programs on multiple heterogeneous devices. In: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO ’14, pp. 273:273–273:283. ACM, New York (2014). doi:10.1145/2544137.2544163
– reference: Menychtas, K., Shen, K., Scott, M.L.: Disengaged scheduling for fair, protected access to fast computational accelerators. In: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’14, pp. 301–316. ACM, New York (2014). doi:10.1145/2541940.2541963
– reference: Wu, H., Diamos, G., Sheard, T., Aref, M., Baxter, S., Garland, M., Yalamanchili, S.: Red fox: an execution environment for relational query processing on gpus. In: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO ’14, pp. 44:44–44:54. ACM, New York (2014)
– reference: Ravi, V.T., Ma, W., Chiu, D., Agrawal, G.: Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations. In: Proceedings of the 24th ACM International Conference on Supercomputing, ICS ’10, pp. 137–146. ACM, New York (2010). doi:10.1145/1810085.1810106
– reference: Cederman, D., Tsigas, P.: On dynamic load balancing on graphics processors. In: Proceedings of the 23rd ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware. GH ’08, pp. 57–64. Aire-la-Ville, Switzerland (2008)
– reference: Chen, L., Villa, O., Krishnamoorthy, S., Gao, G.: Dynamic load balancing on single- and multi-GPU systems. In: IEEE International Symposium on Parallel Distributed Processing (IPDPS), pp. 1–12 (2010). doi:10.1109/IPDPS.2010.5470413
– reference: Scogland, T., Rountree, B., chun Feng, W., De Supinski, B.: Heterogeneous task scheduling for accelerated OpenMP. In: IEEE 26th International Parallel Distributed Processing Symposium (IPDPS), pp. 144–155 (2012). doi:10.1109/IPDPS.2012.23
– reference: Lee, J., Samadi, M., Park, Y., Mahlke, S.: Transparent CPU–GPU collaboration for data-parallel kernels on heterogeneous systems. In: Proceedings of the 22nd international conference on Parallel architectures and compilation techniques, PACT (2013)
– reference: Sbîrlea, A., Zou, Y., Budimlíc, Z., Cong, J., Sarkar, V.: Mapping a data-flow programming model onto heterogeneous platforms. In: Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems, LCTES ’12, pp. 61–70. ACM, New York (2012). doi:10.1145/2248418.2248428
– reference: Kato, S., McThrow, M., Maltzahn, C., Brandt, S.: Gdev: First-class gpu resource management in the operating system. In: Proceedings of the 2012 USENIX Conference on Annual Technical Conference. USENIX ATC’12, pp. 37–37. USENIX Association, Berkeley, CA, USA (2012)
– reference: AMD: AMD APP SDK. AMD, 2.9 edn
– reference: NVIDIA: NVIDIA Compute Visual Profiler. NVIDIA Corporation, Santa Clara, California, 4.0 edn. (2011)
– reference: Bakhoda, A., Yuan, G., Fung, W.W.L., Wong, H., Aamodt, T.M.: Analyzing cuda workloads using a detailed gpu simulator. In: IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 163–174. Boston, MA, USA (2009)
– reference: Nguyen, D., Lenharth, A., Pingali, K.: A lightweight infrastructure for graph analytics. Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. SOSP ’13, pp. 456–471. ACM, New York (2013)
– reference: Lee, K., Liu, L.: Efficient data partitioning model for heterogeneous graphs in the cloud. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC ’13, pp. 46:1–46:12. ACM, New York (2013)
– reference: Burtscher, M., Nasre, R., Pingali, K.: A quantitative study of irregular programs on gpus. In: 2012 IEEE International Symposium on Workload Characterization (IISWC), pp. 141–151 (2012). doi:10.1109/IISWC.2012.6402918
– reference: Zhang, Y., Owens, J.D.: A quantitative performance analysis model for gpu architectures. In: 17th International Conference on High-Performance Computer Architecture (HPCA-17), pp. 382–393. IEEE Computer Society, San Antonio, TX, USA (2011)
– reference: Hong, S., Kim, S.K., Oguntebi, T., Olukotun, K.: Accelerating cuda graph algorithms at maximum warp. In: Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming. PPoPP ’11, pp. 267–276. ACM, New York (2011)
– reference: Grewe, D., Wang, Z., O’Boyle, M.: Portable mapping of data parallel programs to OpenCL for heterogeneous systems. In: IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 1–10 (2013). doi:10.1109/CGO.2013.6494993
– reference: Collange, S., Defour, D., Parello, D.: Barra, a modular functional gpu simulator for gpgpu. Tech. Rep. hal-00359342 (2009)
– reference: Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J., Lee, S.H., Skadron, K.: Rodinia: A benchmark suite for heterogeneous computing. In: IEEE International Symposium on Workload Characterization, 2009. IISWC 2009, pp. 44–54 (2009). doi:10.1109/IISWC.2009.5306797
– reference: NVIDIA: NVIDIA CUDA Tools SDK CUPTI. NVIDIA Corporation, Santa Clara, California, 1.0 edn. (2011)
– reference: Ariel, A., Fung, W.W.L., Turner, A.E., Aamodt, T.M.: Visualizing complex dynamics in many-core accelerator architectures. In: IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 164–174. IEEE Computer Society, White Plains, NY, USA (2010)
– ident: 482_CR2
– ident: 482_CR22
  doi: 10.1109/ISPASS.2012.6189206
– ident: 482_CR40
  doi: 10.1145/2442516.2442524
– ident: 482_CR46
  doi: 10.1145/1837274.1837289
– ident: 482_CR65
  doi: 10.1145/2581122.2544166
– ident: 482_CR33
  doi: 10.1145/2628071.2628088
– ident: 482_CR63
  doi: 10.1109/IPDPS.2012.23
– ident: 482_CR26
– ident: 482_CR32
  doi: 10.1007/978-3-540-92990-1_4
– ident: 482_CR17
  doi: 10.1145/1073970.1073974
– ident: 482_CR12
  doi: 10.1145/341800.341803
– ident: 482_CR15
  doi: 10.1109/IISWC.2012.6402918
– ident: 482_CR30
  doi: 10.1145/1941553.1941590
– ident: 482_CR38
  doi: 10.1145/2525526.2525847
– ident: 482_CR51
– ident: 482_CR19
  doi: 10.1109/IISWC.2009.5306797
– ident: 482_CR29
– ident: 482_CR21
  doi: 10.1109/MASCOTS.2010.43
– volume: 23
  start-page: 187
  issue: 2
  year: 2011
  ident: 482_CR7
  publication-title: Concurr Comput Pract Exp
  doi: 10.1002/cpe.1631
– ident: 482_CR36
  doi: 10.1109/IISWC.2009.5306801
– ident: 482_CR57
  doi: 10.1145/1810085.1810106
– ident: 482_CR6
  doi: 10.1109/ISPASS.2010.5452029
– ident: 482_CR41
– ident: 482_CR8
  doi: 10.1145/1693453.1693470
– ident: 482_CR18
  doi: 10.1007/978-3-642-36036-7_14
– ident: 482_CR35
– ident: 482_CR58
  doi: 10.1145/2541940.2541971
– ident: 482_CR52
– ident: 482_CR64
  doi: 10.1145/1772954.1772992
– ident: 482_CR31
– ident: 482_CR4
– ident: 482_CR62
  doi: 10.1109/IPDPS.2009.5161068
– volume: 46
  start-page: 720
  issue: 5
  year: 1999
  ident: 482_CR13
  publication-title: J. ACM
  doi: 10.1145/324133.324234
– ident: 482_CR9
  doi: 10.1109/ISPASS.2009.4919648
– ident: 482_CR61
  doi: 10.1145/2248418.2248428
– ident: 482_CR47
  doi: 10.1145/2541940.2541963
– ident: 482_CR43
  doi: 10.1109/IPDPS.2013.101
– ident: 482_CR34
– ident: 482_CR49
  doi: 10.1145/2517349.2522739
– ident: 482_CR3
  doi: 10.1145/1229428.1229448
– ident: 482_CR53
  doi: 10.1145/2544137.2544163
– ident: 482_CR59
  doi: 10.1145/2043556.2043579
– ident: 482_CR10
  doi: 10.1145/2581122.2544165
– ident: 482_CR39
  doi: 10.1145/2384616.2384639
– ident: 482_CR56
  doi: 10.1109/CCGrid.2012.78
– ident: 482_CR5
– ident: 482_CR1
– ident: 482_CR23
– ident: 482_CR42
  doi: 10.1145/2503210.2503302
– ident: 482_CR48
– ident: 482_CR27
– ident: 482_CR28
  doi: 10.1145/1693453.1693504
– ident: 482_CR20
  doi: 10.1109/IPDPS.2010.5470413
– ident: 482_CR24
  doi: 10.1109/IISWC.2010.5649549
– ident: 482_CR14
  doi: 10.1145/2482767.2482794
– ident: 482_CR55
  doi: 10.1145/1996130.1996160
– volume: 5
  start-page: 716
  issue: 8
  year: 2012
  ident: 482_CR44
  publication-title: Proc. VLDB Endow.
  doi: 10.14778/2212351.2212354
– ident: 482_CR45
  doi: 10.1145/1669112.1669121
– ident: 482_CR25
  doi: 10.1109/CGO.2013.6494993
– ident: 482_CR66
  doi: 10.1109/HPCA.2011.5749745
– ident: 482_CR37
  doi: 10.1145/1941553.1941591
– ident: 482_CR60
  doi: 10.1145/2517349.2522715
– ident: 482_CR54
  doi: 10.1145/2287076.2287091
– ident: 482_CR11
  doi: 10.1145/2287076.2287090
– ident: 482_CR50
– ident: 482_CR16
SSID ssj0009788
Score 2.0999167
Snippet Integrated GPU systems are a cost-effective and energy-efficient option for accelerating data-intensive applications. While these platforms have reduced...
SourceID proquest
crossref
springer
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 336
SubjectTerms Affinity
Analytics
Computer Science
Platforms
Processor Architectures
Resource management
Resource scheduling
Run time (computers)
Software Engineering/Programming and Operating Systems
System effectiveness
Theory of Computation
SummonAdditionalLinks – databaseName: SpringerLink Journals (ICM)
  dbid: U2A
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEF6kXrz4FqtV9uBJCSTZ7KPHotbqQUox0FvYpwg1FRvFn-9smjUqKnhILtlsYCaz8w0z8w1CJ8owKxKmI2dADRlNRSSIlZHrp7ET1lBi62qLWzbKs5spnTZ93ItQ7R5SkvVJ_anZjTMf_cKVASwE4LhKIXT3dVx5OmiZdnk9bBKsh0Y8oyKkMn_a4qszahHmt6Ro7WuGm2i9AYl4sNTqFlqx5TbaCAMYcGOPO2gy0BrchldieY8vZCVxTTLiqZfxvMTXgQvC4KtxjsczWXmMusCvDxJP_JCIR4ubCfRNP-YuyoeXd-ejqBmSEGmSsCqScDeAuogDZ5wa8NdJpiSIxepYKOdjLDjPDHexTojSUjGiuRVEOU_OmEqyhzrlvLT7CCfSEcElUbGlGWNKUQPRREydS5lzst9FcZBWoRsGcT_IYla03MdewIWvGvMCLt666PTjlaclfcZfi3tBBUVjSYsCAArlngSHd9FZUEv7-NfNDv61-hCtwYfEsiSnhzrV84s9ArRRqeP673oHeM7MJQ
  priority: 102
  providerName: Springer Nature
Title Accelerating Data Analytics on Integrated GPU Platforms via Runtime Specialization
URI https://link.springer.com/article/10.1007/s10766-016-0482-x
https://www.proquest.com/docview/2015704517
Volume 46
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3NT9swFH-C9rLL-Nq0bqzygRMoUhLHjjmhMtryrapQiZ0ifyIklgLNEH8-z6lDtElwSKLIiQ9-9vvwe_79AHaU4VYkXEfOoBgylopIUCsjt5_GTljDqK2rLS758Sw7vWE3YcNtEcoqG51YK2oz136PHIP0hOUeDCU_eHiMPGuUz64GCo1V6KIKFqID3cPh5WTawu7mNfMkLiUW5RkTTV5zeXgu5z6axitDN_PlX8vUupv_ZUhrwzNah8_BYySDpYg3YMWWm7DWsDGQsDi3YDrQGm2Il2h5S45kJUmNOOJxmMm8JCcNMIQh48mMTO5l5R3WBXm-k2TqGSP-WBLo6MPhzC8wGw2vfx1HgTEh0jThVSTxbtAFow4tc2rQeCeZkuhCWB0L5XzAhcrN5C7WCVVaKk51bgVVziM1ppJ-hU45L-03IIl0VOSSqtiyjHOlmMHQImbOpdw5ud-DuBmtQgc4cc9qcV-0QMh-gAtfQuYHuHjpwe7bLw9LLI2PPt5uRFCEZbUo2knQg71GLG3zu519_7izH_AJX8SyIGcbOtXTX_sTfY1K9WFVjMZ96A6OLs6v_HP8-2zYD9MMW2fp4BWPEdVd
linkProvider ProQuest
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9QwEB6VcoALb9QtBXyACyhSEseOe6hQRdnu0lJVpSv1FvxESEt2y4ZS_hS_kZkkJgKJ3npILk58mBnPwzPzDcAL46RXmbRJcMiGQuQqUdzrJGznaVDeCe7baosjOZkV78_E2Rr8ir0wVFYZdWKrqN3C0h05BumZKAkMpXyzPE9oahRlV-MIjU4sDvzPHxiyrXame8jfl3k-fnf6dpL0UwUSyzPZJBrfDt0UHtB65Q4NXFYYjWbW21SZQEEJKgBXhtRm3FhtJLelV9wEQjPMNcd9b8DNgqMlp8708f4A8lu2cy7x4IqkLISKWdSuVa-UFLvjU6BTe_m3HRyc23_ysa2ZG9-DO71_ynY7gboPa75-AHfj7AfWq4KHcLJrLVoskp_6M9vTjWYtvgmhPrNFzaYRhsKx_eMZO57rhtzjFbv4otkJzaf46tnHpacb-74V9BHMroWSj2G9XtR-A1imA1el5ib1opDSGOEwkElFCLkMQW-PII3UqmwPXk4zNObVALtMBK6oYI0IXF2O4NWfX5YdcsdVH29FFlT9IV5Vg8iN4HVky7D83802r97sOdyanH44rA6nRwdP4DYuqK4UaAvWm2_f_VP0chrzrBUtBp-uW5Z_A2psC0M
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9NAEB61qYR64Y2aUmAPcAFZ2F7voweECmnaUBRFgUi9mX0ipOKkxED5a_w6Zm0vFkjtrQf74MceZmbnsTv7fQBPteVOZtwk3qIaCpbLRFKnEr-fp146y6hrui2m_HhRvDtlpxvwO56FCW2V0Sc2jtouTVgjxyI9YyKAoYiXvmuLmI3Gr1fnSWCQCjutkU6jNZET9-snlm_rV5MR6vpZno8PP749TjqGgcTQjNeJwrvFlIV6jGS5xWCXFVphyHUmldqHAgWdgRU-NRnVRmlOjXCSah-QDXNFcdxN2BKhKhrA1pvD6WzeQ_6KhvUSpzFLRMFk3FNtD-4JHip5vApMcS_-jYp9qvvf7mwT9Ma34WaXrZKD1rzuwIar7sKtyARBOsdwD-YHxmD8CtZUfSYjVSvSoJ0EDGiyrMgkglJYcjRbkNmZqkOyvCY_vigyD2wVXx35sHJh_b47GHofFtciywcwqJaV2wGSKU-lUFSnjhWca80sljUp8z7n3qv9IaRRWqXpoMwDo8ZZ2YMwBwGXoX0tCLi8GMLzv7-sWhyPqz7eiyoouym9LnsDHMKLqJb-9aWD7V492BO4gXZcvp9MTx7CNj6XbV_QHgzqb9_dI0x5av24sy0Cn67bnP8AbLQQ1Q
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Accelerating+Data+Analytics+on+Integrated+GPU+Platforms+via+Runtime+Specialization&rft.jtitle=International+journal+of+parallel+programming&rft.au=Farooqui%2C+Naila&rft.au=Roy%2C+Indrajit&rft.au=Chen%2C+Yuan&rft.au=Talwar%2C+Vanish&rft.date=2018-04-01&rft.pub=Springer+Nature+B.V&rft.issn=0885-7458&rft.eissn=1573-7640&rft.volume=46&rft.issue=2&rft.spage=336&rft.epage=375&rft_id=info:doi/10.1007%2Fs10766-016-0482-x&rft.externalDBID=HAS_PDF_LINK
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0885-7458&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0885-7458&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0885-7458&client=summon