Accelerating Data Analytics on Integrated GPU Platforms via Runtime Specialization

Integrated GPU systems are a cost-effective and energy-efficient option for accelerating data-intensive applications. While these platforms have reduced overhead of offloading computation to the GPU and potential for fine-grained resource scheduling, there remain several open challenges: (1) the dis...

Full description

Saved in:

Bibliographic Details
Published in	International journal of parallel programming Vol. 46; no. 2; pp. 336 - 375
Main Authors	Farooqui, Naila, Roy, Indrajit, Chen, Yuan, Talwar, Vanish, Barik, Rajkishore, Lewis, Brian, Shpeisman, Tatiana, Schwan, Karsten
Format	Journal Article
Language	English
Published	New York Springer US 01.04.2018 Springer Nature B.V
Subjects	Affinity Analytics Computer Science Platforms Processor Architectures Resource management Resource scheduling Run time (computers) Software Engineering/Programming and Operating Systems System effectiveness Theory of Computation Scheduling Resource management GPU Dynamic instrumentation
Online Access	Get full text

Cover

Loading…

Abstract	Integrated GPU systems are a cost-effective and energy-efficient option for accelerating data-intensive applications. While these platforms have reduced overhead of offloading computation to the GPU and potential for fine-grained resource scheduling, there remain several open challenges: (1) the distinct execution models inherent in the heterogeneous devices present on such platforms drives the need to dynamically match workload characteristics to the underlying resources, (2) the complex architecture and programming models of such systems require substantial application knowledge to achieve high performance, and (3) as such systems become prevalent, there is a need to extend their utility from running known regular data-parallel applications to the broader set of input-dependent, irregular applications common in enterprise settings. The key contribution of our research is to enable runtime specialization on such integrated GPU platforms by matching application characteristics to the underlying heterogeneous resources for both regular and irregular workloads. Our approach enables profile-driven resource management and optimizations for such platforms, providing high application performance and system throughput. Toward this end, this work proposes two novel schedulers with distinct goals: (a) a device-affinity, contention-aware scheduler that incorporates instrumentation-driven optimizations to improve the throughput of running diverse applications on integrated CPU–GPU servers, and (b) a specialized, affinity-aware work-stealing scheduler that efficiently distributes work across all CPU and GPU cores for the same application, taking into account both application characteristics and architectural differences of the underlying devices.
AbstractList	Integrated GPU systems are a cost-effective and energy-efficient option for accelerating data-intensive applications. While these platforms have reduced overhead of offloading computation to the GPU and potential for fine-grained resource scheduling, there remain several open challenges: (1) the distinct execution models inherent in the heterogeneous devices present on such platforms drives the need to dynamically match workload characteristics to the underlying resources, (2) the complex architecture and programming models of such systems require substantial application knowledge to achieve high performance, and (3) as such systems become prevalent, there is a need to extend their utility from running known regular data-parallel applications to the broader set of input-dependent, irregular applications common in enterprise settings. The key contribution of our research is to enable runtime specialization on such integrated GPU platforms by matching application characteristics to the underlying heterogeneous resources for both regular and irregular workloads. Our approach enables profile-driven resource management and optimizations for such platforms, providing high application performance and system throughput. Toward this end, this work proposes two novel schedulers with distinct goals: (a) a device-affinity, contention-aware scheduler that incorporates instrumentation-driven optimizations to improve the throughput of running diverse applications on integrated CPU–GPU servers, and (b) a specialized, affinity-aware work-stealing scheduler that efficiently distributes work across all CPU and GPU cores for the same application, taking into account both application characteristics and architectural differences of the underlying devices. Integrated GPU systems are a cost-effective and energy-efficient option for accelerating data-intensive applications. While these platforms have reduced overhead of offloading computation to the GPU and potential for fine-grained resource scheduling, there remain several open challenges: (1) the distinct execution models inherent in the heterogeneous devices present on such platforms drives the need to dynamically match workload characteristics to the underlying resources, (2) the complex architecture and programming models of such systems require substantial application knowledge to achieve high performance, and (3) as such systems become prevalent, there is a need to extend their utility from running known regular data-parallel applications to the broader set of input-dependent, irregular applications common in enterprise settings. The key contribution of our research is to enable runtime specialization on such integrated GPU platforms by matching application characteristics to the underlying heterogeneous resources for both regular and irregular workloads. Our approach enables profile-driven resource management and optimizations for such platforms, providing high application performance and system throughput. Toward this end, this work proposes two novel schedulers with distinct goals: (a) a device-affinity, contention-aware scheduler that incorporates instrumentation-driven optimizations to improve the throughput of running diverse applications on integrated CPU–GPU servers, and (b) a specialized, affinity-aware work-stealing scheduler that efficiently distributes work across all CPU and GPU cores for the same application, taking into account both application characteristics and architectural differences of the underlying devices.
Author	Barik, Rajkishore Lewis, Brian Shpeisman, Tatiana Roy, Indrajit Chen, Yuan Talwar, Vanish Farooqui, Naila Schwan, Karsten
Author_xml	– sequence: 1 givenname: Naila orcidid: 0000-0001-6592-5328 surname: Farooqui fullname: Farooqui, Naila email: naila.farooqui@intel.com organization: Intel Labs – sequence: 2 givenname: Indrajit surname: Roy fullname: Roy, Indrajit organization: Hewlett Packard Labs – sequence: 3 givenname: Yuan surname: Chen fullname: Chen, Yuan organization: Hewlett Packard Labs – sequence: 4 givenname: Vanish surname: Talwar fullname: Talwar, Vanish organization: PernixData, Inc – sequence: 5 givenname: Rajkishore surname: Barik fullname: Barik, Rajkishore organization: Intel Labs – sequence: 6 givenname: Brian surname: Lewis fullname: Lewis, Brian organization: Intel Labs – sequence: 7 givenname: Tatiana surname: Shpeisman fullname: Shpeisman, Tatiana organization: Intel Labs – sequence: 8 givenname: Karsten surname: Schwan fullname: Schwan, Karsten organization: Georgia Institute of Technology
BookMark	eNp9kE1LAzEQhoNUsK3-AG8Bz6uT7EfSY6laCwVLteeQzWZLyjZbk1Raf72pKwiCHmbmMPPMzPsOUM-2ViN0TeCWALA7T4AVRQIkRsZpcjhDfZKzNGFFBj3UB87zhGU5v0AD7zcAMGKc99FyrJRutJPB2DW-l0HisZXNMRjlcWvxzAa9jl1d4elihReNDHXrth6_G4mXexvMVuOXnVZGNuYjbmntJTqvZeP11XcdotXjw-vkKZk_T2eT8TxRKSlCImOuMsjTmtKMVpxQkpUSKNcKeFkD0GJESMVqUCQtlSyLVDHN07KWigKV6RDddHt3rn3bax_Ept27-LwXFKJ2yHLC4hTpppRrvXe6FjtnttIdBQFxsk501olonThZJw6RYb8YZcKXtuCkaf4laUf6eMWutfv56W_oE_YDhSw
CitedBy_id	crossref_primary_10_54365_adyumbd_1508182 crossref_primary_10_1002_cpe_8318
Cites_doi	10.1109/ISPASS.2012.6189206 10.1145/2442516.2442524 10.1145/1837274.1837289 10.1145/2581122.2544166 10.1145/2628071.2628088 10.1109/IPDPS.2012.23 10.1007/978-3-540-92990-1_4 10.1145/1073970.1073974 10.1145/341800.341803 10.1109/IISWC.2012.6402918 10.1145/1941553.1941590 10.1145/2525526.2525847 10.1109/IISWC.2009.5306797 10.1109/MASCOTS.2010.43 10.1002/cpe.1631 10.1109/IISWC.2009.5306801 10.1145/1810085.1810106 10.1109/ISPASS.2010.5452029 10.1145/1693453.1693470 10.1007/978-3-642-36036-7_14 10.1145/2541940.2541971 10.1145/1772954.1772992 10.1109/IPDPS.2009.5161068 10.1145/324133.324234 10.1109/ISPASS.2009.4919648 10.1145/2248418.2248428 10.1145/2541940.2541963 10.1109/IPDPS.2013.101 10.1145/2517349.2522739 10.1145/1229428.1229448 10.1145/2544137.2544163 10.1145/2043556.2043579 10.1145/2581122.2544165 10.1145/2384616.2384639 10.1109/CCGrid.2012.78 10.1145/2503210.2503302 10.1145/1693453.1693504 10.1109/IPDPS.2010.5470413 10.1109/IISWC.2010.5649549 10.1145/2482767.2482794 10.1145/1996130.1996160 10.14778/2212351.2212354 10.1145/1669112.1669121 10.1109/CGO.2013.6494993 10.1109/HPCA.2011.5749745 10.1145/1941553.1941591 10.1145/2517349.2522715 10.1145/2287076.2287091 10.1145/2287076.2287090
ContentType	Journal Article
Copyright	Springer Science+Business Media New York 2016 International Journal of Parallel Programming is a copyright of Springer, (2016). All Rights Reserved.
Copyright_xml	– notice: Springer Science+Business Media New York 2016 – notice: International Journal of Parallel Programming is a copyright of Springer, (2016). All Rights Reserved.
DBID	AAYXX CITATION 3V. 7SC 7WY 7WZ 7XB 87Z 8AL 8FD 8FE 8FG 8FK 8FL 8G5 ABUWG AFKRA ARAPS AZQEC BENPR BEZIV BGLVJ CCPQU DWQXO FRNLG F~G GNUQQ GUQSH HCIFZ JQ2 K60 K6~ K7- L.- L7M L~C L~D M0C M0N M2O MBDVC P5Z P62 PHGZM PHGZT PKEHL PQBIZ PQBZA PQEST PQGLB PQQKQ PQUKI Q9U
DOI	10.1007/s10766-016-0482-x
DatabaseName	CrossRef ProQuest Central (Corporate) Computer and Information Systems Abstracts ABI/INFORM Collection ABI/INFORM Global (PDF only) ProQuest Central (purchase pre-March 2016) ABI/INFORM Collection Computing Database (Alumni Edition) Technology Research Database ProQuest SciTech Collection ProQuest Technology Collection ProQuest Central (Alumni) (purchase pre-March 2016) ABI/INFORM Collection (Alumni Edition) ProQuest Research Library ProQuest Central (Alumni) ProQuest Central UK/Ireland Health Research Premium Collection ProQuest Central Essentials ProQuest Central Business Premium Collection Technology Collection ProQuest One ProQuest Central Korea Business Premium Collection (Alumni) ABI/INFORM Global (Corporate) ProQuest Central Student ProQuest Research Library SciTech Premium Collection ProQuest Computer Science Collection ProQuest Business Collection (Alumni Edition) ProQuest Business Collection Computer Science Database ABI/INFORM Professional Advanced Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional ABI/INFORM Global Computing Database Research Library Research Library (Corporate) Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Premium ProQuest One Academic ProQuest One Academic Middle East (New) ProQuest One Business (OCUL) ProQuest One Business (Alumni) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central Basic
DatabaseTitle	CrossRef ABI/INFORM Global (Corporate) ProQuest Business Collection (Alumni Edition) ProQuest One Business Research Library Prep Computer Science Database ProQuest Central Student Technology Collection Technology Research Database Computer and Information Systems Abstracts – Academic ProQuest One Academic Middle East (New) ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection Computer and Information Systems Abstracts ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College Research Library (Alumni Edition) ABI/INFORM Complete ProQuest Central ABI/INFORM Professional Advanced ProQuest One Applied & Life Sciences ProQuest Central Korea ProQuest Research Library ProQuest Central (New) Advanced Technologies Database with Aerospace ABI/INFORM Complete (Alumni Edition) Advanced Technologies & Aerospace Collection Business Premium Collection ABI/INFORM Global ProQuest Computing ABI/INFORM Global (Alumni Edition) ProQuest Central Basic ProQuest Computing (Alumni Edition) ProQuest One Academic Eastern Edition ProQuest Technology Collection ProQuest SciTech Collection ProQuest Business Collection Computer and Information Systems Abstracts Professional Advanced Technologies & Aerospace Database ProQuest One Academic UKI Edition ProQuest One Business (Alumni) ProQuest One Academic ProQuest One Academic (New) ProQuest Central (Alumni) Business Premium Collection (Alumni)
DatabaseTitleList	ABI/INFORM Global (Corporate)
Database_xml	– sequence: 1 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	1573-7640
EndPage	375
ExternalDocumentID	10_1007_s10766_016_0482_x
GroupedDBID	-4Z -59 -5G -BR -EM -Y2 -~C -~X .4S .86 .DC .VR 06D 0R~ 0VY 199 1N0 2.D 203 28- 29J 2J2 2JN 2JY 2KG 2LR 2P1 2VQ 2~H 30V 3V. 4.4 406 408 409 40D 40E 5GY 5QI 5VS 67Z 6NX 78A 7WY 8FE 8FG 8FL 8G5 8TC 8UJ 95- 95. 95~ 96X AAAVM AABHQ AACDK AAHNG AAIAL AAJBT AAJKR AANZL AAOBN AARHV AARTL AASML AATNV AATVU AAUYE AAWCG AAYIU AAYJJ AAYQN AAYTO AAYZH ABAKF ABBBX ABBXA ABDBF ABDPE ABDZT ABECU ABFSI ABFTD ABFTV ABHLI ABHQN ABJNI ABJOX ABKCH ABKTR ABMNI ABMQK ABNWP ABQBU ABQSL ABSXP ABTAH ABTEG ABTHY ABTKH ABTMW ABULA ABUWG ABWNU ABXPI ACAOD ACBXY ACDTI ACGFO ACGFS ACHSB ACHXU ACIHN ACKNC ACMDZ ACMLO ACNCT ACOKC ACOMO ACPIV ACREN ACUHS ACZOJ ADHIR ADINQ ADKNI ADKPE ADMLS ADRFC ADTPH ADURQ ADYFF ADYOE ADZKW AEAQA AEBTG AEFIE AEFQL AEGAL AEGNC AEJHL AEJRE AEKMD AEMSY AENEX AEOHA AEPYU AESKC AETLH AEVLU AEXYK AFBBN AFEXP AFGCZ AFKRA AFLOW AFQWF AFWTZ AFYQB AFZKB AGAYW AGDGC AGGDS AGJBK AGMZJ AGQEE AGQMX AGRTI AGWIL AGWZB AGYKE AHAVH AHBYD AHKAY AHSBF AHYZX AIAKS AIGIU AIIXL AILAN AITGF AJBLW AJRNO AJZVZ ALMA_UNASSIGNED_HOLDINGS ALWAN AMKLP AMTXH AMXSW AMYLF AOCGG ARAPS ARCSS ARMRJ AXYYD AYJHY AZFZN AZQEC B-. B0M BA0 BBWZM BDATZ BENPR BEZIV BGLVJ BGNMA BKOMP BPHCQ BSONS CAG CCPQU COF CS3 CSCUP DDRTE DL5 DNIVK DPUIP DU5 DWQXO E.L EAD EAP EAS EBLON EBS EDO EIOEI EJD EMK EPL ESBYG ESX FEDTE FERAY FFXSO FIGPU FINBP FNLPD FRNLG FRRFC FSGXE FWDCC GGCAI GGRSB GJIRD GNUQQ GNWQR GQ6 GQ7 GQ8 GROUPED_ABI_INFORM_COMPLETE GROUPED_ABI_INFORM_RESEARCH GUQSH GXS H13 HCIFZ HF~ HG5 HG6 HMJXF HQYDN HRMNR HVGLF HZ~ H~9 I-F I09 IHE IJ- IKXTQ ITM IWAJR IXC IZIGR IZQ I~X I~Z J-C J0Z JBSCW JCJTX JZLTJ K60 K6V K6~ K7- KDC KOV KOW LAK LLZTM M0C M0N M2O M4Y MA- MS~ N2Q NB0 NDZJH NPVJJ NQJWS NU0 O9- O93 O9G O9I O9J OAM OVD P19 P62 P9O PF0 PQBIZ PQBZA PQQKQ PROAC PT4 PT5 Q2X QOK QOS R89 R9I RHV RNI RNS ROL RPX RSV RZC RZE RZK S16 S1Z S26 S27 S28 S3B SAP SCJ SCLPG SCO SDH SDM SHX SISQX SJYHP SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE SZN T13 T16 TAE TEORI TN5 TSG TSK TSV TUC TUS U2A U5U UG4 UOJIU UTJUX UZXMN VC2 VFIZW VXZ W23 W48 WH7 WK8 YLTOR Z45 Z7R Z7X Z81 Z83 Z88 Z8R Z8W Z92 ZMTXR ZY4 ~8M ~EX AAPKM AAYXX ABBRH ABDBE ABFSG ACMFV ACSTC ADHKG AEZWR AFDZB AFHIU AFOHR AGQPQ AHPBZ AHWEU AIXLP ATHPR AYFIA CITATION PHGZM PHGZT 7SC 7XB 8AL 8FD 8FK ABRTQ JQ2 L.- L7M L~C L~D MBDVC PKEHL PQEST PQGLB PQUKI Q9U
ID	FETCH-LOGICAL-c316t-a316d4053f2242d81214ba028ec08bf0026911d7f0c13bcab63c7e83bfac202a3
IEDL.DBID	BENPR
ISSN	0885-7458
IngestDate	Fri Jul 25 23:30:11 EDT 2025 Tue Jul 01 00:50:31 EDT 2025 Thu Apr 24 22:59:14 EDT 2025 Fri Feb 21 02:37:21 EST 2025
IsPeerReviewed	true
IsScholarly	true
Issue	2
Keywords	Scheduling Resource management GPU Dynamic instrumentation
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c316t-a316d4053f2242d81214ba028ec08bf0026911d7f0c13bcab63c7e83bfac202a3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ORCID	0000-0001-6592-5328
PQID	2015704517
PQPubID	48389
PageCount	40
ParticipantIDs	proquest_journals_2015704517 crossref_primary_10_1007_s10766_016_0482_x crossref_citationtrail_10_1007_s10766_016_0482_x springer_journals_10_1007_s10766_016_0482_x
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	20180400 2018-4-00 20180401
PublicationDateYYYYMMDD	2018-04-01
PublicationDate_xml	– month: 4 year: 2018 text: 20180400
PublicationDecade	2010
PublicationPlace	New York
PublicationPlace_xml	– name: New York
PublicationTitle	International journal of parallel programming
PublicationTitleAbbrev	Int J Parallel Prog
PublicationYear	2018
Publisher	Springer US Springer Nature B.V
Publisher_xml	– name: Springer US – name: Springer Nature B.V
References	Rossbach, C.J., Currey, J., Silberstein, M., Ray, B., Witchel, E.: Ptask: operating system abstractions to manage gpus as compute devices. In: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles. SOSP ’11, pp. 233–248. ACM, New York (2011) Becchi, M., Sajjapongse, K., Graves, I., Procter, A., Ravi, V., Chakradhar, S.: A virtual memory based runtime to support multi-tenancy in clusters with gpus. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, HPDC ’12, pp. 97–108. ACM, New York, NY, USA (2012). doi:10.1145/2287076.2287090 Lê, N.M., Pop, A., Cohen, A., Zappa Nardelli, F.: Correct and efficient work-stealing for weak memory models. In: Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’13, pp. 69–80. ACM, New York (2013). doi:10.1145/2442516.2442524 BlumofeRDLeisersonCEScheduling multithreaded computations by work stealingJ. ACM1999465720748174765310.1145/324133.3242341065.68504 Boyer, M., Skadron, K., Che, S., Jayasena, N.: Load balancing in a changing world: dealing with heterogeneity and performance variability. In: Proceedings of the ACM International Conference on Computing Frontiers, CF ’13, pp. 21:1–21:10. ACM, New York (2013) Kaleem, R., Barik, R., Shpeisman, T., Lewis, B.T., Hu, C., Pingali, K.: Adaptive heterogeneous scheduling for integrated gpus. In: Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, PACT ’14, pp. 151–162. ACM, New York (2014). doi:10.1145/2628071.2628088 Chen, L., Villa, O., Krishnamoorthy, S., Gao, G.: Dynamic load balancing on single- and multi-GPU systems. In: IEEE International Symposium on Parallel Distributed Processing (IPDPS), pp. 1–12 (2010). doi:10.1109/IPDPS.2010.5470413 Kim, J., Kim, H., Lee, J.H., Lee, J.: Achieving a single compute device image in OpenCL for multiple GPUs. In: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming, PPoPP ’11, pp. 277–288. ACM, NY, USA (2011). doi:10.1145/1941553.1941591 Grewe, D., Wang, Z., O’Boyle, M.F.P.: Portable mapping of data parallel programs to opencl for heterogeneous systems. In: IEEE Computer Society CGO, pp. 22:1–22:10 (2013). http://dblp.uni-trier.de/db/conf/cgo/cgo2013.html#GreweWO13 Rossbach, C.J., Yu, Y., Currey, J., Martin, J.P., Fetterly, D.: Dandelion: a compiler and runtime for heterogeneous systems. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. SOSP ’13, pp. 49–68. ACM, New York (2013) Wu, H., Diamos, G., Sheard, T., Aref, M., Baxter, S., Garland, M., Yalamanchili, S.: Red fox: an execution environment for relational query processing on gpus. In: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO ’14, pp. 44:44–44:54. ACM, New York (2014) Goswami, N., Shankar, R., Joshi, M., Li, T.: Exploring gpgpu workloads: Characterization methodology, analysis and microarchitecture evaluation implications. In: 2010 IEEE International Symposium on Workload Characterization (IISWC), pp. 1–10 (2010). doi:10.1109/IISWC.2010.5649549 Kumar, V., Frampton, D., Blackburn, S.M., Grove, D., Tardieu, O.: Work-stealing without the baggage. In: Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA ’12, pp. 297–314. ACM, New York (2012). doi:10.1145/2384616.2384639 Ravi, V.T., Ma, W., Chiu, D., Agrawal, G.: Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations. In: Proceedings of the 24th ACM International Conference on Supercomputing, ICS ’10, pp. 137–146. ACM, New York (2010). doi:10.1145/1810085.1810106 LowYBicksonDGonzalezJGuestrinCKyrolaAHellersteinJMDistributed graphlab: a framework for machine learning and data mining in the cloudProc. VLDB Endow.20125871672710.14778/2212351.2212354 Chase, D., Lev, Y.: Dynamic circular work-stealing deque. In: Proceedings of the Seventeenth Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’05, pp. 21–28. ACM, New York (2005). doi:10.1145/1073970.1073974 Kerr, A., Diamos, G., Yalamanchili, S.: A characterization and analysis of ptx kernels. In: IEEE International Symposium on Workload Characterization, 2009. IISWC 2009, pp. 3–12 (2009). doi:10.1109/IISWC.2009.5306801 Schaa, D., Kaeli, D.: Exploring the multiple-GPU design space. In: IEEE International Symposium on Parallel Distributed Processing. IPDPS., pp. 1–12 (2009). doi:10.1109/IPDPS.2009.5161068 Kato, S., McThrow, M., Maltzahn, C., Brandt, S.: Gdev: First-class gpu resource management in the operating system. In: Proceedings of the 2012 USENIX Conference on Annual Technical Conference. USENIX ATC’12, pp. 37–37. USENIX Association, Berkeley, CA, USA (2012) Li, D., Becchi, M.: Deploying graph algorithms on gpus: an adaptive solution. In: 2013 IEEE 27th International Symposium on Parallel Distributed Processing (IPDPS), pp. 1013–1024 (2013). doi:10.1109/IPDPS.2013.101 Sbîrlea, A., Zou, Y., Budimlíc, Z., Cong, J., Sarkar, V.: Mapping a data-flow programming model onto heterogeneous platforms. In: Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems, LCTES ’12, pp. 61–70. ACM, New York (2012). doi:10.1145/2248418.2248428 Menychtas, K., Shen, K., Scott, M.L.: Disengaged scheduling for fair, protected access to fast computational accelerators. In: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’14, pp. 301–316. ACM, New York (2014). doi:10.1145/2541940.2541963 Grewe, D., Wang, Z., O’Boyle, M.: Portable mapping of data parallel programs to OpenCL for heterogeneous systems. In: IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 1–10 (2013). doi:10.1109/CGO.2013.6494993 Lee, J., Samadi, M., Park, Y., Mahlke, S.: Transparent CPU–GPU collaboration for data-parallel kernels on heterogeneous systems. In: Proceedings of the 22nd international conference on Parallel architectures and compilation techniques, PACT (2013) Group, K.O.W.: The OpenCL Specification (2008). http://www.khronos.org/registry/cl/specs/opencl-1.0.29.pdf AMD: CodeXL. AMD, 3.1 edn Gupta, V., Schwan, K., Tolia, N., Talwar, V., Ranganathan, P.: Pegasus: Coordinated scheduling for virtualized accelerator-based systems. In: Proceedings of the 2011 Usenix Annual Technical Conference, Portland, USA (2011) Ariel, A., Fung, W.W.L., Turner, A.E., Aamodt, T.M.: Visualizing complex dynamics in many-core accelerator architectures. In: IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 164–174. IEEE Computer Society, White Plains, NY, USA (2010) Jiménez, V.J., Vilanova, L., Gelado, I., Gil, M., Fursin, G., Navarro, N.: Predictive runtime code scheduling for heterogeneous architectures. In: Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers, HiPEAC ’09, pp. 19–33. Springer, Berlin, Heidelberg (2009). doi:10.1007/978-3-540-92990-1_4 Barik, R., Kaleem, R., Majeti, D., Lewis, B.T., Shpeisman, T., Hu, C., Ni, Y., Adl-Tabatabai, A.R.: Efficient mapping of irregular c++ applications to integrated gpus. In: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO ’14, pp. 33:33–33:43. ACM, New York, NY, USA (2014) Zhang, Y., Owens, J.D.: A quantitative performance analysis model for gpu architectures. In: 17th International Conference on High-Performance Computer Architecture (HPCA-17), pp. 382–393. IEEE Computer Society, San Antonio, TX, USA (2011) Nguyen, D., Lenharth, A., Pingali, K.: A lightweight infrastructure for graph analytics. Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. SOSP ’13, pp. 456–471. ACM, New York (2013) Nilakant, K., Yoneki, E.: On the efficacy of apus for heterogeneous graph computation. In: Fourth Workshop on Systems for Future Multicore Architectures (2014) NVIDIA: NVIDIA CUDA Tools SDK CUPTI. NVIDIA Corporation, Santa Clara, California, 1.0 edn. (2011) AugonnetCThibaultSNamystRWacrenierPAStarpu: a unified platform for task scheduling on heterogeneous multicore architecturesConcurr Comput Pract Exp201123218719810.1002/cpe.1631 Luk, C.K., Hong, S., Kim, H.: Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 42, pp. 45–55. ACM, New York (2009). doi:10.1145/1669112.1669121 Baghsorkhi, S.S., Delahaye, M., Patel, S.J., Gropp, W.D., Hwu, W.M.W.: An adaptive performance modeling tool for gpu architectures. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’10, pp. 105–114. ACM, New York (2010). doi:10.1145/1693453.1693470 Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J., Lee, S.H., Skadron, K.: Rodinia: A benchmark suite for heterogeneous computing. In: IEEE International Symposium on Workload Characterization, 2009. IISWC 2009, pp. 44–54 (2009). doi:10.1109/IISWC.2009.5306797 Lee, K., Liu, L.: Efficient data partitioning model for heterogeneous graphs in the cloud. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC ’13, pp. 46:1–46:12. ACM, New York (2013) Ravi, V.T., Becchi, M., Agrawal, G., Chakradhar, S.: Supporting gpu sharing in cloud environments with a transparent runtime consolidation framework. In: Proceedings of the 20th international symposium on High performance distributed computing, HPDC ’11, pp. 217–228. ACM, New York (2011). doi:10.1145/1996130.1996160 Kim, S., Roy, I., Talwar, V.: Evaluating integrated graphics processors for data center workloads. In: Proceedings of the Workshop on Power-Aware Computing and Systems, HotPower ’13, pp. 8:1–8:5. ACM, New York, NY, USA 482_CR17 482_CR18 482_CR15 482_CR59 482_CR16 482_CR57 482_CR14 482_CR58 482_CR11 482_CR55 482_CR12 482_CR56 482_CR19 482_CR20 482_CR64 482_CR21 482_CR65 482_CR62 482_CR63 482_CR60 482_CR61 RD Blumofe (482_CR13) 1999; 46 482_CR28 482_CR29 482_CR26 482_CR27 482_CR24 482_CR25 482_CR22 482_CR66 482_CR23 482_CR31 482_CR32 482_CR30 482_CR39 482_CR37 C Augonnet (482_CR7) 2011; 23 482_CR38 482_CR35 482_CR36 482_CR33 482_CR34 482_CR1 482_CR2 482_CR3 482_CR4 482_CR5 482_CR6 482_CR8 482_CR42 482_CR9 482_CR43 482_CR40 482_CR41 Y Low (482_CR44) 2012; 5 482_CR48 482_CR49 482_CR46 482_CR47 482_CR45 482_CR53 482_CR10 482_CR54 482_CR51 482_CR52 482_CR50
References_xml	– reference: Barik, R., Kaleem, R., Majeti, D., Lewis, B.T., Shpeisman, T., Hu, C., Ni, Y., Adl-Tabatabai, A.R.: Efficient mapping of irregular c++ applications to integrated gpus. In: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO ’14, pp. 33:33–33:43. ACM, New York, NY, USA (2014) – reference: Ravi, V.T., Becchi, M., Jiang, W., Agrawal, G., Chakradhar, S.: Scheduling concurrent applications on a cluster of cpu–gpu nodes. In: Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster. Cloud and Grid Computing (ccgrid 2012), CCGRID ’12, pp. 140–147. IEEE Computer Society, Washington (2012) – reference: Kumar, V., Frampton, D., Blackburn, S.M., Grove, D., Tardieu, O.: Work-stealing without the baggage. In: Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA ’12, pp. 297–314. ACM, New York (2012). doi:10.1145/2384616.2384639 – reference: Kerr, A., Diamos, G., Yalamanchili, S.: A characterization and analysis of ptx kernels. In: IEEE International Symposium on Workload Characterization, 2009. IISWC 2009, pp. 3–12 (2009). doi:10.1109/IISWC.2009.5306801 – reference: Guo, Y., Zhao, J., Cave, V., Sarkar, V.: Slaw: A scalable locality-aware adaptive work-stealing scheduler for multi-core systems. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’10, pp. 341–342. ACM, New York (2010). doi:10.1145/1693453.1693504 – reference: AMD: CodeXL. AMD, 3.1 edn – reference: Li, D., Becchi, M.: Deploying graph algorithms on gpus: an adaptive solution. In: 2013 IEEE 27th International Symposium on Parallel Distributed Processing (IPDPS), pp. 1013–1024 (2013). doi:10.1109/IPDPS.2013.101 – reference: BlumofeRDLeisersonCEScheduling multithreaded computations by work stealingJ. ACM1999465720748174765310.1145/324133.3242341065.68504 – reference: Boyer, M., Skadron, K., Che, S., Jayasena, N.: Load balancing in a changing world: dealing with heterogeneity and performance variability. In: Proceedings of the ACM International Conference on Computing Frontiers, CF ’13, pp. 21:1–21:10. ACM, New York (2013) – reference: Rossbach, C.J., Yu, Y., Currey, J., Martin, J.P., Fetterly, D.: Dandelion: a compiler and runtime for heterogeneous systems. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. SOSP ’13, pp. 49–68. ACM, New York (2013) – reference: Schaa, D., Kaeli, D.: Exploring the multiple-GPU design space. In: IEEE International Symposium on Parallel Distributed Processing. IPDPS., pp. 1–12 (2009). doi:10.1109/IPDPS.2009.5161068 – reference: Luk, C.K., Hong, S., Kim, H.: Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 42, pp. 45–55. ACM, New York (2009). doi:10.1145/1669112.1669121 – reference: Group, K.O.W.: The OpenCL Specification (2008). http://www.khronos.org/registry/cl/specs/opencl-1.0.29.pdf – reference: Jiménez, V.J., Vilanova, L., Gelado, I., Gil, M., Fursin, G., Navarro, N.: Predictive runtime code scheduling for heterogeneous architectures. In: Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers, HiPEAC ’09, pp. 19–33. Springer, Berlin, Heidelberg (2009). doi:10.1007/978-3-540-92990-1_4 – reference: Lê, N.M., Pop, A., Cohen, A., Zappa Nardelli, F.: Correct and efficient work-stealing for weak memory models. In: Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’13, pp. 69–80. ACM, New York (2013). doi:10.1145/2442516.2442524 – reference: Agrawal, K., He, Y., Leiserson, C.E.: Adaptive work stealing with parallelism feedback. In: Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’07, pp. 112–120. ACM, New York (2007). doi:10.1145/1229428.1229448 – reference: Chase, D., Lev, Y.: Dynamic circular work-stealing deque. In: Proceedings of the Seventeenth Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’05, pp. 21–28. ACM, New York (2005). doi:10.1145/1073970.1073974 – reference: Grewe, D., Wang, Z., O’Boyle, M.F.P.: Portable mapping of data parallel programs to opencl for heterogeneous systems. In: IEEE Computer Society CGO, pp. 22:1–22:10 (2013). http://dblp.uni-trier.de/db/conf/cgo/cgo2013.html#GreweWO13 – reference: Compute architecture of intel processor graphics. https://software.intel.com/en-us/file/compute-architecture-of-intel-processor-graphics-gen8pdf – reference: Nilakant, K., Yoneki, E.: On the efficacy of apus for heterogeneous graph computation. In: Fourth Workshop on Systems for Future Multicore Architectures (2014) – reference: Gautier, T., Ferreira Lima, J.V., Maillard, N., Raffin, B.: Locality-aware work stealing on multi-CPU and multi-GPU architectures. In: 6th Workshop on Programmability Issues for Heterogeneous Multicores (MULTIPROG). Berlin, Germany (2013). https://hal.inria.fr/hal-00780890 – reference: Kato, S., Lakshmanan, K., Rajkumar, R., Ishikawa, Y.: Timegraph: Gpu scheduling for real-time multi-tasking environments. In: Proceedings of the 2011 USENIX Conference on USENIX Annual Technical Conference. USENIXATC’11, pp. 2–2. USENIX Association, Berkeley, CA, USA (2011) – reference: Ravi, V.T., Becchi, M., Agrawal, G., Chakradhar, S.: Supporting gpu sharing in cloud environments with a transparent runtime consolidation framework. In: Proceedings of the 20th international symposium on High performance distributed computing, HPDC ’11, pp. 217–228. ACM, New York (2011). doi:10.1145/1996130.1996160 – reference: Kim, S., Roy, I., Talwar, V.: Evaluating integrated graphics processors for data center workloads. In: Proceedings of the Workshop on Power-Aware Computing and Systems, HotPower ’13, pp. 8:1–8:5. ACM, New York, NY, USA (2013) – reference: AugonnetCThibaultSNamystRWacrenierPAStarpu: a unified platform for task scheduling on heterogeneous multicore architecturesConcurr Comput Pract Exp201123218719810.1002/cpe.1631 – reference: IMPACT: The parboil benchmark suite (2007). http://www.crhc.uiuc.edu/IMPACT/parboil.php – reference: Luo, L., Wong, M., Hwu, W.M.: An effective gpu implementation of breadth-first search. In: Proceedings of the 47th design automation conference, pp. 52–55. ACM (2010) – reference: Becchi, M., Sajjapongse, K., Graves, I., Procter, A., Ravi, V., Chakradhar, S.: A virtual memory based runtime to support multi-tenancy in clusters with gpus. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, HPDC ’12, pp. 97–108. ACM, New York, NY, USA (2012). doi:10.1145/2287076.2287090 – reference: LowYBicksonDGonzalezJGuestrinCKyrolaAHellersteinJMDistributed graphlab: a framework for machine learning and data mining in the cloudProc. VLDB Endow.20125871672710.14778/2212351.2212354 – reference: Farooqui, N., Kerr, A., Eisenhauer, G., Schwan, K., Yalamanchili, S.: Lynx: A dynamic instrumentation system for data-parallel applications on gpgpu architectures. In: 2012 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 58 –67 (2012). doi:10.1109/ISPASS.2012.6189206 – reference: Kaleem, R., Barik, R., Shpeisman, T., Lewis, B.T., Hu, C., Pingali, K.: Adaptive heterogeneous scheduling for integrated gpus. In: Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, PACT ’14, pp. 151–162. ACM, New York (2014). doi:10.1145/2628071.2628088 – reference: Wang, L., Cui, H., Duan, Y., Lu, F., Feng, X., Yew, P.C.: An adaptive task creation strategy for work-stealing scheduling. In: Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO ’10, pp. 266–277. ACM, New York (2010). doi:10.1145/1772954.1772992 – reference: Ribic, H., Liu, Y.D.: Energy-efficient work-stealing language runtimes. In: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’14, pp. 513–528. ACM, New York (2014). doi:10.1145/2541940.2541971 – reference: Baghsorkhi, S.S., Delahaye, M., Patel, S.J., Gropp, W.D., Hwu, W.M.W.: An adaptive performance modeling tool for gpu architectures. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’10, pp. 105–114. ACM, New York (2010). doi:10.1145/1693453.1693470 – reference: Bender, M.A., Rabin, M.O.: Scheduling cilk multithreaded parallel programs on processors of different speeds. In: Proceedings of the Twelfth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA ’00, pp. 13–21. ACM, New York (2000). doi:10.1145/341800.341803 – reference: Chatterjee, S., Grossman, M., Sbîrlea, A.S., Sarkar, V.: Dynamic task parallelism with a GPU work-stealing runtime system. In: Rajopadhye, S.V., Strout, M.M. (eds.) Languages and Compilers for Parallel Computing, 24th International Workshop, LCPC 2011, Fort Collins, CO, USA, September 8–10, 2011. Revised Selected Papers, Lecture Notes in Computer Science, vol. 7146, pp. 203–217. Springer (2011). doi:10.1007/978-3-642-36036-7_14 – reference: Gupta, V., Schwan, K., Tolia, N., Talwar, V., Ranganathan, P.: Pegasus: Coordinated scheduling for virtualized accelerator-based systems. In: Proceedings of the 2011 Usenix Annual Technical Conference, Portland, USA (2011) – reference: Intel thread building blocks. www.threadbuildingblocks.org – reference: Phull, R., Li, C.H., Rao, K., Cadambi, H., Chakradhar, S.: Interference-driven resource management for gpu-based heterogeneous clusters. In: Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing, HPDC ’12, pp. 109–120. ACM, New York (2012). doi:10.1145/2287076.2287091 – reference: Goswami, N., Shankar, R., Joshi, M., Li, T.: Exploring gpgpu workloads: Characterization methodology, analysis and microarchitecture evaluation implications. In: 2010 IEEE International Symposium on Workload Characterization (IISWC), pp. 1–10 (2010). doi:10.1109/IISWC.2010.5649549 – reference: Kim, J., Kim, H., Lee, J.H., Lee, J.: Achieving a single compute device image in OpenCL for multiple GPUs. In: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming, PPoPP ’11, pp. 277–288. ACM, NY, USA (2011). doi:10.1145/1941553.1941591 – reference: Min, S.J., Iancu, C., Yelick, K.: Hierarchical work stealing on manycore clusters. In: In Fifth Conference on Partitioned Global Address Space Programming Models (2011) – reference: Rossbach, C.J., Currey, J., Silberstein, M., Ray, B., Witchel, E.: Ptask: operating system abstractions to manage gpus as compute devices. In: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles. SOSP ’11, pp. 233–248. ACM, New York (2011) – reference: Pandit, P., Govindarajan, R.: Fluidic kernels: cooperative execution of opencl programs on multiple heterogeneous devices. In: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO ’14, pp. 273:273–273:283. ACM, New York (2014). doi:10.1145/2544137.2544163 – reference: Menychtas, K., Shen, K., Scott, M.L.: Disengaged scheduling for fair, protected access to fast computational accelerators. In: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’14, pp. 301–316. ACM, New York (2014). doi:10.1145/2541940.2541963 – reference: Wu, H., Diamos, G., Sheard, T., Aref, M., Baxter, S., Garland, M., Yalamanchili, S.: Red fox: an execution environment for relational query processing on gpus. In: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO ’14, pp. 44:44–44:54. ACM, New York (2014) – reference: Ravi, V.T., Ma, W., Chiu, D., Agrawal, G.: Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations. In: Proceedings of the 24th ACM International Conference on Supercomputing, ICS ’10, pp. 137–146. ACM, New York (2010). doi:10.1145/1810085.1810106 – reference: Cederman, D., Tsigas, P.: On dynamic load balancing on graphics processors. In: Proceedings of the 23rd ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware. GH ’08, pp. 57–64. Aire-la-Ville, Switzerland (2008) – reference: Chen, L., Villa, O., Krishnamoorthy, S., Gao, G.: Dynamic load balancing on single- and multi-GPU systems. In: IEEE International Symposium on Parallel Distributed Processing (IPDPS), pp. 1–12 (2010). doi:10.1109/IPDPS.2010.5470413 – reference: Scogland, T., Rountree, B., chun Feng, W., De Supinski, B.: Heterogeneous task scheduling for accelerated OpenMP. In: IEEE 26th International Parallel Distributed Processing Symposium (IPDPS), pp. 144–155 (2012). doi:10.1109/IPDPS.2012.23 – reference: Lee, J., Samadi, M., Park, Y., Mahlke, S.: Transparent CPU–GPU collaboration for data-parallel kernels on heterogeneous systems. In: Proceedings of the 22nd international conference on Parallel architectures and compilation techniques, PACT (2013) – reference: Sbîrlea, A., Zou, Y., Budimlíc, Z., Cong, J., Sarkar, V.: Mapping a data-flow programming model onto heterogeneous platforms. In: Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems, LCTES ’12, pp. 61–70. ACM, New York (2012). doi:10.1145/2248418.2248428 – reference: Kato, S., McThrow, M., Maltzahn, C., Brandt, S.: Gdev: First-class gpu resource management in the operating system. In: Proceedings of the 2012 USENIX Conference on Annual Technical Conference. USENIX ATC’12, pp. 37–37. USENIX Association, Berkeley, CA, USA (2012) – reference: AMD: AMD APP SDK. AMD, 2.9 edn – reference: NVIDIA: NVIDIA Compute Visual Profiler. NVIDIA Corporation, Santa Clara, California, 4.0 edn. (2011) – reference: Bakhoda, A., Yuan, G., Fung, W.W.L., Wong, H., Aamodt, T.M.: Analyzing cuda workloads using a detailed gpu simulator. In: IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 163–174. Boston, MA, USA (2009) – reference: Nguyen, D., Lenharth, A., Pingali, K.: A lightweight infrastructure for graph analytics. Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. SOSP ’13, pp. 456–471. ACM, New York (2013) – reference: Lee, K., Liu, L.: Efficient data partitioning model for heterogeneous graphs in the cloud. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC ’13, pp. 46:1–46:12. ACM, New York (2013) – reference: Burtscher, M., Nasre, R., Pingali, K.: A quantitative study of irregular programs on gpus. In: 2012 IEEE International Symposium on Workload Characterization (IISWC), pp. 141–151 (2012). doi:10.1109/IISWC.2012.6402918 – reference: Zhang, Y., Owens, J.D.: A quantitative performance analysis model for gpu architectures. In: 17th International Conference on High-Performance Computer Architecture (HPCA-17), pp. 382–393. IEEE Computer Society, San Antonio, TX, USA (2011) – reference: Hong, S., Kim, S.K., Oguntebi, T., Olukotun, K.: Accelerating cuda graph algorithms at maximum warp. In: Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming. PPoPP ’11, pp. 267–276. ACM, New York (2011) – reference: Grewe, D., Wang, Z., O’Boyle, M.: Portable mapping of data parallel programs to OpenCL for heterogeneous systems. In: IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 1–10 (2013). doi:10.1109/CGO.2013.6494993 – reference: Collange, S., Defour, D., Parello, D.: Barra, a modular functional gpu simulator for gpgpu. Tech. Rep. hal-00359342 (2009) – reference: Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J., Lee, S.H., Skadron, K.: Rodinia: A benchmark suite for heterogeneous computing. In: IEEE International Symposium on Workload Characterization, 2009. IISWC 2009, pp. 44–54 (2009). doi:10.1109/IISWC.2009.5306797 – reference: NVIDIA: NVIDIA CUDA Tools SDK CUPTI. NVIDIA Corporation, Santa Clara, California, 1.0 edn. (2011) – reference: Ariel, A., Fung, W.W.L., Turner, A.E., Aamodt, T.M.: Visualizing complex dynamics in many-core accelerator architectures. In: IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 164–174. IEEE Computer Society, White Plains, NY, USA (2010) – ident: 482_CR2 – ident: 482_CR22 doi: 10.1109/ISPASS.2012.6189206 – ident: 482_CR40 doi: 10.1145/2442516.2442524 – ident: 482_CR46 doi: 10.1145/1837274.1837289 – ident: 482_CR65 doi: 10.1145/2581122.2544166 – ident: 482_CR33 doi: 10.1145/2628071.2628088 – ident: 482_CR63 doi: 10.1109/IPDPS.2012.23 – ident: 482_CR26 – ident: 482_CR32 doi: 10.1007/978-3-540-92990-1_4 – ident: 482_CR17 doi: 10.1145/1073970.1073974 – ident: 482_CR12 doi: 10.1145/341800.341803 – ident: 482_CR15 doi: 10.1109/IISWC.2012.6402918 – ident: 482_CR30 doi: 10.1145/1941553.1941590 – ident: 482_CR38 doi: 10.1145/2525526.2525847 – ident: 482_CR51 – ident: 482_CR19 doi: 10.1109/IISWC.2009.5306797 – ident: 482_CR29 – ident: 482_CR21 doi: 10.1109/MASCOTS.2010.43 – volume: 23 start-page: 187 issue: 2 year: 2011 ident: 482_CR7 publication-title: Concurr Comput Pract Exp doi: 10.1002/cpe.1631 – ident: 482_CR36 doi: 10.1109/IISWC.2009.5306801 – ident: 482_CR57 doi: 10.1145/1810085.1810106 – ident: 482_CR6 doi: 10.1109/ISPASS.2010.5452029 – ident: 482_CR41 – ident: 482_CR8 doi: 10.1145/1693453.1693470 – ident: 482_CR18 doi: 10.1007/978-3-642-36036-7_14 – ident: 482_CR35 – ident: 482_CR58 doi: 10.1145/2541940.2541971 – ident: 482_CR52 – ident: 482_CR64 doi: 10.1145/1772954.1772992 – ident: 482_CR31 – ident: 482_CR4 – ident: 482_CR62 doi: 10.1109/IPDPS.2009.5161068 – volume: 46 start-page: 720 issue: 5 year: 1999 ident: 482_CR13 publication-title: J. ACM doi: 10.1145/324133.324234 – ident: 482_CR9 doi: 10.1109/ISPASS.2009.4919648 – ident: 482_CR61 doi: 10.1145/2248418.2248428 – ident: 482_CR47 doi: 10.1145/2541940.2541963 – ident: 482_CR43 doi: 10.1109/IPDPS.2013.101 – ident: 482_CR34 – ident: 482_CR49 doi: 10.1145/2517349.2522739 – ident: 482_CR3 doi: 10.1145/1229428.1229448 – ident: 482_CR53 doi: 10.1145/2544137.2544163 – ident: 482_CR59 doi: 10.1145/2043556.2043579 – ident: 482_CR10 doi: 10.1145/2581122.2544165 – ident: 482_CR39 doi: 10.1145/2384616.2384639 – ident: 482_CR56 doi: 10.1109/CCGrid.2012.78 – ident: 482_CR5 – ident: 482_CR1 – ident: 482_CR23 – ident: 482_CR42 doi: 10.1145/2503210.2503302 – ident: 482_CR48 – ident: 482_CR27 – ident: 482_CR28 doi: 10.1145/1693453.1693504 – ident: 482_CR20 doi: 10.1109/IPDPS.2010.5470413 – ident: 482_CR24 doi: 10.1109/IISWC.2010.5649549 – ident: 482_CR14 doi: 10.1145/2482767.2482794 – ident: 482_CR55 doi: 10.1145/1996130.1996160 – volume: 5 start-page: 716 issue: 8 year: 2012 ident: 482_CR44 publication-title: Proc. VLDB Endow. doi: 10.14778/2212351.2212354 – ident: 482_CR45 doi: 10.1145/1669112.1669121 – ident: 482_CR25 doi: 10.1109/CGO.2013.6494993 – ident: 482_CR66 doi: 10.1109/HPCA.2011.5749745 – ident: 482_CR37 doi: 10.1145/1941553.1941591 – ident: 482_CR60 doi: 10.1145/2517349.2522715 – ident: 482_CR54 doi: 10.1145/2287076.2287091 – ident: 482_CR11 doi: 10.1145/2287076.2287090 – ident: 482_CR50 – ident: 482_CR16
SSID	ssj0009788
Score	2.0999167
Snippet	Integrated GPU systems are a cost-effective and energy-efficient option for accelerating data-intensive applications. While these platforms have reduced...
SourceID	proquest crossref springer
SourceType	Aggregation Database Enrichment Source Index Database Publisher
StartPage	336
SubjectTerms	Affinity Analytics Computer Science Platforms Processor Architectures Resource management Resource scheduling Run time (computers) Software Engineering/Programming and Operating Systems System effectiveness Theory of Computation
SummonAdditionalLinks	– databaseName: SpringerLink Journals (ICM) dbid: U2A link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEF6kXrz4FqtV9uBJCSTZ7KPHotbqQUox0FvYpwg1FRvFn-9smjUqKnhILtlsYCaz8w0z8w1CJ8owKxKmI2dADRlNRSSIlZHrp7ET1lBi62qLWzbKs5spnTZ93ItQ7R5SkvVJ_anZjTMf_cKVASwE4LhKIXT3dVx5OmiZdnk9bBKsh0Y8oyKkMn_a4qszahHmt6Ro7WuGm2i9AYl4sNTqFlqx5TbaCAMYcGOPO2gy0BrchldieY8vZCVxTTLiqZfxvMTXgQvC4KtxjsczWXmMusCvDxJP_JCIR4ubCfRNP-YuyoeXd-ejqBmSEGmSsCqScDeAuogDZ5wa8NdJpiSIxepYKOdjLDjPDHexTojSUjGiuRVEOU_OmEqyhzrlvLT7CCfSEcElUbGlGWNKUQPRREydS5lzst9FcZBWoRsGcT_IYla03MdewIWvGvMCLt666PTjlaclfcZfi3tBBUVjSYsCAArlngSHd9FZUEv7-NfNDv61-hCtwYfEsiSnhzrV84s9ArRRqeP673oHeM7MJQ priority: 102 providerName: Springer Nature
Title	Accelerating Data Analytics on Integrated GPU Platforms via Runtime Specialization
URI	https://link.springer.com/article/10.1007/s10766-016-0482-x https://www.proquest.com/docview/2015704517
Volume	46
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3NT9swFH-C9rLL-Nq0bqzygRMoUhLHjjmhMtryrapQiZ0ifyIklgLNEH8-z6lDtElwSKLIiQ9-9vvwe_79AHaU4VYkXEfOoBgylopIUCsjt5_GTljDqK2rLS758Sw7vWE3YcNtEcoqG51YK2oz136PHIP0hOUeDCU_eHiMPGuUz64GCo1V6KIKFqID3cPh5WTawu7mNfMkLiUW5RkTTV5zeXgu5z6axitDN_PlX8vUupv_ZUhrwzNah8_BYySDpYg3YMWWm7DWsDGQsDi3YDrQGm2Il2h5S45kJUmNOOJxmMm8JCcNMIQh48mMTO5l5R3WBXm-k2TqGSP-WBLo6MPhzC8wGw2vfx1HgTEh0jThVSTxbtAFow4tc2rQeCeZkuhCWB0L5XzAhcrN5C7WCVVaKk51bgVVziM1ppJ-hU45L-03IIl0VOSSqtiyjHOlmMHQImbOpdw5ud-DuBmtQgc4cc9qcV-0QMh-gAtfQuYHuHjpwe7bLw9LLI2PPt5uRFCEZbUo2knQg71GLG3zu519_7izH_AJX8SyIGcbOtXTX_sTfY1K9WFVjMZ96A6OLs6v_HP8-2zYD9MMW2fp4BWPEdVd
linkProvider	ProQuest
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9QwEB6VcoALb9QtBXyACyhSEseOe6hQRdnu0lJVpSv1FvxESEt2y4ZS_hS_kZkkJgKJ3npILk58mBnPwzPzDcAL46RXmbRJcMiGQuQqUdzrJGznaVDeCe7baosjOZkV78_E2Rr8ir0wVFYZdWKrqN3C0h05BumZKAkMpXyzPE9oahRlV-MIjU4sDvzPHxiyrXame8jfl3k-fnf6dpL0UwUSyzPZJBrfDt0UHtB65Q4NXFYYjWbW21SZQEEJKgBXhtRm3FhtJLelV9wEQjPMNcd9b8DNgqMlp8708f4A8lu2cy7x4IqkLISKWdSuVa-UFLvjU6BTe_m3HRyc23_ysa2ZG9-DO71_ynY7gboPa75-AHfj7AfWq4KHcLJrLVoskp_6M9vTjWYtvgmhPrNFzaYRhsKx_eMZO57rhtzjFbv4otkJzaf46tnHpacb-74V9BHMroWSj2G9XtR-A1imA1el5ib1opDSGOEwkElFCLkMQW-PII3UqmwPXk4zNObVALtMBK6oYI0IXF2O4NWfX5YdcsdVH29FFlT9IV5Vg8iN4HVky7D83802r97sOdyanH44rA6nRwdP4DYuqK4UaAvWm2_f_VP0chrzrBUtBp-uW5Z_A2psC0M
linkToPdf	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9NAEB61qYR64Y2aUmAPcAFZ2F7voweECmnaUBRFgUi9mX0ipOKkxED5a_w6Zm0vFkjtrQf74MceZmbnsTv7fQBPteVOZtwk3qIaCpbLRFKnEr-fp146y6hrui2m_HhRvDtlpxvwO56FCW2V0Sc2jtouTVgjxyI9YyKAoYiXvmuLmI3Gr1fnSWCQCjutkU6jNZET9-snlm_rV5MR6vpZno8PP749TjqGgcTQjNeJwrvFlIV6jGS5xWCXFVphyHUmldqHAgWdgRU-NRnVRmlOjXCSah-QDXNFcdxN2BKhKhrA1pvD6WzeQ_6KhvUSpzFLRMFk3FNtD-4JHip5vApMcS_-jYp9qvvf7mwT9Ma34WaXrZKD1rzuwIar7sKtyARBOsdwD-YHxmD8CtZUfSYjVSvSoJ0EDGiyrMgkglJYcjRbkNmZqkOyvCY_vigyD2wVXx35sHJh_b47GHofFtciywcwqJaV2wGSKU-lUFSnjhWca80sljUp8z7n3qv9IaRRWqXpoMwDo8ZZ2YMwBwGXoX0tCLi8GMLzv7-sWhyPqz7eiyoouym9LnsDHMKLqJb-9aWD7V492BO4gXZcvp9MTx7CNj6XbV_QHgzqb9_dI0x5av24sy0Cn67bnP8AbLQQ1Q
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Accelerating+Data+Analytics+on+Integrated+GPU+Platforms+via+Runtime+Specialization&rft.jtitle=International+journal+of+parallel+programming&rft.au=Farooqui%2C+Naila&rft.au=Roy%2C+Indrajit&rft.au=Chen%2C+Yuan&rft.au=Talwar%2C+Vanish&rft.date=2018-04-01&rft.pub=Springer+Nature+B.V&rft.issn=0885-7458&rft.eissn=1573-7640&rft.volume=46&rft.issue=2&rft.spage=336&rft.epage=375&rft_id=info:doi/10.1007%2Fs10766-016-0482-x&rft.externalDBID=HAS_PDF_LINK
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0885-7458&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0885-7458&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0885-7458&client=summon