Integration framework for online thread throttling with thread and page mapping on NUMA systems

Non-Uniform Memory Access (NUMA) systems are prevalent in HPC, where optimal thread-to-core allocation and page placement are crucial for enhancing performance and minimizing energy usage. Moreover, considering that NUMA systems have hardware support for a large number of hardware threads and many p...

Full description

Saved in:
Bibliographic Details
Published inJournal of parallel and distributed computing Vol. 205; p. 105145
Main Authors Schwarzrock, Janaina, Rocha, Hiago Mayk G. de A., Lorenzon, Arthur F., de Souza, Samuel Xavier, Beck, Antonio Carlos S.
Format Journal Article
LanguageEnglish
Published Elsevier Inc 01.11.2025
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Non-Uniform Memory Access (NUMA) systems are prevalent in HPC, where optimal thread-to-core allocation and page placement are crucial for enhancing performance and minimizing energy usage. Moreover, considering that NUMA systems have hardware support for a large number of hardware threads and many parallel applications have limited scalability, artificially decreasing the number of threads by using Dynamic Concurrency Throttling (DCT) may bring further improvements. However, the optimal configuration (thread mapping, page mapping, number of threads) for energy and performance, quantified by the Energy-Delay Product (EDP), varies with the system hardware, application and input set, even during execution. Because of this dynamic nature, adaptability is essential, making offline strategies much less effective. Despite their effectiveness, online strategies introduce additional execution overhead, which involves learning at run-time and the cost of transitions between configurations with cache warm-ups, thread and data reallocation. Thus, balancing the learning time and solution quality becomes increasingly significant. In this scenario, this work proposes a framework to find such optimal configurations into a single, online, and efficient approach. Our experimental evaluation shows that our framework improves EDP and performance compared to online state-of-the-art techniques of thread/page mapping (up to 69.3% and 43.4%) and DCT (up to 93.2% and 74.9%), while being totally adaptive and requiring minimum user intervention. •Thread-to-core allocation and page placement are key to performance and energy efficiency in NUMA systems.•Parallel applications often scale poorly, so Dynamic Concurrency Throttling reduces threads to improve performance.•Optimal thread mapping, page placement, and thread count for Energy Delay Product depend on hardware and input variability.•Online strategies often add runtime overhead, requiring a balance between learning time and solution quality.•This study presents a framework that combines optimization strategies into a unified, efficient online solution.
AbstractList Non-Uniform Memory Access (NUMA) systems are prevalent in HPC, where optimal thread-to-core allocation and page placement are crucial for enhancing performance and minimizing energy usage. Moreover, considering that NUMA systems have hardware support for a large number of hardware threads and many parallel applications have limited scalability, artificially decreasing the number of threads by using Dynamic Concurrency Throttling (DCT) may bring further improvements. However, the optimal configuration (thread mapping, page mapping, number of threads) for energy and performance, quantified by the Energy-Delay Product (EDP), varies with the system hardware, application and input set, even during execution. Because of this dynamic nature, adaptability is essential, making offline strategies much less effective. Despite their effectiveness, online strategies introduce additional execution overhead, which involves learning at run-time and the cost of transitions between configurations with cache warm-ups, thread and data reallocation. Thus, balancing the learning time and solution quality becomes increasingly significant. In this scenario, this work proposes a framework to find such optimal configurations into a single, online, and efficient approach. Our experimental evaluation shows that our framework improves EDP and performance compared to online state-of-the-art techniques of thread/page mapping (up to 69.3% and 43.4%) and DCT (up to 93.2% and 74.9%), while being totally adaptive and requiring minimum user intervention. •Thread-to-core allocation and page placement are key to performance and energy efficiency in NUMA systems.•Parallel applications often scale poorly, so Dynamic Concurrency Throttling reduces threads to improve performance.•Optimal thread mapping, page placement, and thread count for Energy Delay Product depend on hardware and input variability.•Online strategies often add runtime overhead, requiring a balance between learning time and solution quality.•This study presents a framework that combines optimization strategies into a unified, efficient online solution.
ArticleNumber 105145
Author Lorenzon, Arthur F.
Schwarzrock, Janaina
Rocha, Hiago Mayk G. de A.
Beck, Antonio Carlos S.
de Souza, Samuel Xavier
Author_xml – sequence: 1
  givenname: Janaina
  surname: Schwarzrock
  fullname: Schwarzrock, Janaina
  email: jschwarzrock@inf.ufrgs.br
  organization: Institute of Informatics, Federal University of Rio Grande do Sul, Brazil
– sequence: 2
  givenname: Hiago Mayk G. de A.
  orcidid: 0000-0002-0827-0131
  surname: Rocha
  fullname: Rocha, Hiago Mayk G. de A.
  email: mayk@lncc.br
  organization: National Laboratory for Scientific Computing, Brazil
– sequence: 3
  givenname: Arthur F.
  surname: Lorenzon
  fullname: Lorenzon, Arthur F.
  email: aflorenzon@inf.ufrgs.br
  organization: Institute of Informatics, Federal University of Rio Grande do Sul, Brazil
– sequence: 4
  givenname: Samuel Xavier
  surname: de Souza
  fullname: de Souza, Samuel Xavier
  email: samuel@dca.ufrn.br
  organization: Federal University of Rio Grande do Norte, Brazil
– sequence: 5
  givenname: Antonio Carlos S.
  orcidid: 0000-0002-4492-1747
  surname: Beck
  fullname: Beck, Antonio Carlos S.
  email: caco@inf.ufrgs.br
  organization: Institute of Informatics, Federal University of Rio Grande do Sul, Brazil
BookMark eNp9kMtuwjAURL2gUoH2B7ryD4T6SRKpG4T6QKLtpqwtx7kBp8SObKuIv28i6LarkeZqRnPPDE2cd4DQAyULSujysV20fW0WjDA5GJIKOUFTkgue5ZzKWzSLsSWEUpkXU6Q2LsE-6GS9w03QHZx8-MaND9i7o3WA0yGArkfxKQ3OHp9sOvzZ2tW413vAne778TjUfOzeVzieY4Iu3qGbRh8j3F91jnYvz1_rt2z7-bpZr7aZYZKnTJhqqUta8IJrTsEILpms8lLwija8Iboyw2usZMLkApgsG2AM8rwqtJZS1nyO2KXXBB9jgEb1wXY6nBUlasSiWjViUSMWdcEyhJ4uIRiW_VgIKhoLzkBtA5ikam__i_8CQKBwNA
Cites_doi 10.1007/s10617-020-09243-5
10.1109/TPDS.2015.2504985
10.1145/1353536.1346317
10.1016/j.jpdc.2023.104720
10.1093/oso/9780198515760.001.0001
10.1109/TPDS.2007.70804
10.1145/1105734.1105745
10.1109/TC.2015.2417533
10.1016/j.jpdc.2010.08.015
10.1145/2490301.2451157
10.1109/TPDS.2018.2872992
10.1016/j.peva.2015.03.001
10.1145/3006385
10.1109/TPDS.2013.104
10.1145/3004054
10.1002/cpe.3487
10.1109/TPDS.2018.2883056
10.1109/40.888701
ContentType Journal Article
Copyright 2025 Elsevier Inc.
Copyright_xml – notice: 2025 Elsevier Inc.
DBID AAYXX
CITATION
DOI 10.1016/j.jpdc.2025.105145
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
ExternalDocumentID 10_1016_j_jpdc_2025_105145
S0743731525001121
GroupedDBID --K
--M
-~X
.~1
0R~
1B1
1~.
1~5
29L
4.4
457
4G.
5GY
5VS
7-5
71M
8P~
9JN
AAEDT
AAEDW
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AATTM
AAXKI
AAXUO
AAYFN
AAYWO
ABBOA
ABDPE
ABEFU
ABFNM
ABFSI
ABJNI
ABMAC
ABWVN
ABXDB
ACDAQ
ACGFS
ACNNM
ACRLP
ACRPL
ACVFH
ACZNC
ADBBV
ADCNI
ADEZE
ADFGL
ADHUB
ADJOM
ADMUD
ADNMO
ADTZH
ADVLN
AEBSH
AECPX
AEIPS
AEKER
AENEX
AEUPX
AFJKZ
AFPUW
AFTJW
AGCQF
AGHFR
AGQPQ
AGUBO
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIGII
AIIUN
AIKHN
AITUG
AKBMS
AKRWK
AKYEP
ALMA_UNASSIGNED_HOLDINGS
AMRAJ
ANKPU
AOUOD
APXCP
ASPBG
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BLXMC
CAG
COF
CS3
DM4
DU5
E.L
EBS
EFBJH
EFKBS
EJD
EO8
EO9
EP2
EP3
F5P
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-2
G-Q
GBLVA
GBOLZ
HLZ
HVGLF
HZ~
H~9
IHE
J1W
JJJVA
K-O
KOM
LG5
LG9
LY7
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
R2-
ROL
RPZ
SBC
SDF
SDG
SDP
SES
SET
SEW
SPC
SPCBC
SST
SSV
SSZ
T5K
TN5
TWZ
WUQ
XJT
XOL
XPP
ZMT
ZU3
ZY4
~G-
AAYXX
CITATION
ID FETCH-LOGICAL-c253t-4cb6a918383a31ec43525b7943b1f3f0abc0162924c74e259fe22e77b8aa555d3
IEDL.DBID .~1
ISSN 0743-7315
IngestDate Thu Aug 21 00:07:24 EDT 2025
Sat Aug 30 17:17:05 EDT 2025
IsPeerReviewed true
IsScholarly true
Keywords NUMA systems
Parallel applications
Page mapping
Dynamic concurrency throttling
Thread mapping
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c253t-4cb6a918383a31ec43525b7943b1f3f0abc0162924c74e259fe22e77b8aa555d3
ORCID 0000-0002-4492-1747
0000-0002-0827-0131
ParticipantIDs crossref_primary_10_1016_j_jpdc_2025_105145
elsevier_sciencedirect_doi_10_1016_j_jpdc_2025_105145
PublicationCentury 2000
PublicationDate November 2025
2025-11-00
PublicationDateYYYYMMDD 2025-11-01
PublicationDate_xml – month: 11
  year: 2025
  text: November 2025
PublicationDecade 2020
PublicationTitle Journal of parallel and distributed computing
PublicationYear 2025
Publisher Elsevier Inc
Publisher_xml – name: Elsevier Inc
References Diener, Cruz, Alves, Navaux, Busse, Heiss (br0070) 2015; 27
Dashti, Fedorova, Funston, Gaud, Lachaize, Lepers, Quema, Roth (br0020) 2013; 41
Petersen, Arbenz (br0650) 2004
Lee, Wu, Ravichandran, Clark (br0380) 2010
Subramanian, Seshadri, Kim, Jaiyen, Mutlu (br0510) 2013
Papadimitriou, Chatzidimitriou, Gizopoulos (br0120) 2019
Denoyelle, Goglin, Ilic, Jeannot, Sousa (br0540) 2018; 30
Schwarzrock, De Oliveira, Ritt, Lorenzon, Beck Filho (br0610) 2020
Li, Martinez (br0170) 2006
Eichenberger, Terboven, Wong, an Mey (br0230) 2012
Popov, Jimborean, Black-Schaffer (br0060) 2019
Li, de Supinski, Schulz, Cameron, Nikolopoulos (br0410) 2010
Gureya, Neto, Karimi, Barreto, Bhatotia, Quema, Rodrigues, Romano, Vlassov (br0010) 2020
Broquedis, Aumage, Goglin, Thibault, Wacrenier, Namyst (br0140) 2010
Curtis-Maury, Dzierwa, Antonopoulos, Nikolopoulos (br0390) 2006
Alessi, Thoman, Georgakoudis, Fahringer, Nikolopoulos (br0180) 2015
Brooks, Bose, Schuster, Jacobson, Kudva, Buyuktosunoglu, Wellman, Zyuban, Gupta, Cook (br0210) 2000; 20
Jung, Lim, Lee, Han (br0370) 2005
Constantinou, Sazeides, Michaud, Fetis, Seznec (br0590) 2005; 33
Linux (br0290) 2022
Villavieja, Karakostas, Vilanova, Etsion, Ramirez, Mendelson, Navarro, Cristal, Unsal (br0580) 2011
Sánchez Barrera, Black-Schaffer, Casas, Moretó, Stupnikova, Popov (br0480) 2020
Schwarzrock, Rocha, Lorenzon, Beck (br0570) 2022
Lepers, Quéma, Fedorova (br0160) 2015
Linux (br0250) 2021
Radojković, Carpenter, Moreto, Čakarević, Verdu, Pajuelo, Cazorla, Nemirovsky, Valero (br0550) 2015; 65
Marathe, Thakkar, Mueller (br0310) 2010; 70
Shafik, Das, Yang, Merrett, Al-Hashimi (br0430) 2015
Quinn (br0530) 2004
De Sensi, Torquati, Danelutto (br0440) 2016; 13
Cruz, Diener, Navaux (br0100) 2015; 27
Wang, Davidson, Soffa (br0360) 2016
Chadha, Mahlke, Narayanasamy (br0450) 2012
Diener, Cruz, Navaux (br0320) 2015
De Sensi (br0350) 2016
Corbet (br0130) Mar 2012
Cruz, Diener, Pilla, Navaux (br0270) 2015
Che, Boyer, Meng, Tarjan, Sheaffer, Lee, Skadron (br0640) 2009
Diener, Cruz, Alves, Navaux, Koren (br0220) 2016; 49
Jeannot, Mercier, Tessier (br0260) 2013; 25
Cruz, Diener, Navaux (br0090) 2012
Kadosh, Hasabnis, Mattson, Pinter, Oren (br0620) 2023
de A. Rocha, Schwarzrock, Lorenzon, Beck (br0680) 2022
Diener, Cruz, Navaux, Busse, Heiß (br0150) 2014
Lorenzon, De Oliveira, Souza, Beck (br0050) 2018; 30
Kleen (br0300) 2004
Bari, Chaimov, Malik, Huck, Chapman, Malony, Sarood (br0190) 2016
Rocha, Moori, Korol, Lorenzon, Beck (br0690) 2024
Schwarzrock, Jordan, Korol, d. Oliveira, Lorenzon, Beck Rutzig, Beck (br0200) 2021; 25
Trahay, Selva, Morel, Marquet (br0330) 2018
Pusukuri, Gupta, Bhuyan (br0340) 2011
Schwarzrock, Jordan, Korol, de Oliveira, Lorenzon, Rutzig, Beck (br0560) 2020
Scravaglieri, Popov, Pilla, Guermouche, Aumage, Saillard (br0490) 2023; 180
Suleman, Qureshi, Patt (br0040) 2008; 43
Dongarra, Heroux, Luszczek (br0670) 2015
Seo, Jo, Lee (br0630) 2011
McCalpin (br0660) 1995
Marathe, Bailey, Lowenthal, Rountree, Schulz, de Supinski (br0470) 2015
Diener, Cruz, Pilla, Dupros, Navaux (br0030) 2015; 88
Bailey, Barszcz, Barton, Browning, Carter, Dagum, Fatoohi, Frederickson, Lasinski, Schreiber (br0520) 1991
Diener, Cruz, Navaux (br0110) 2013
Tam, Azimi, Stumm (br0080) 2007; vol. 41
Achermann, Panwar, Bhattacharjee, Roscoe, Gandhi (br0280) 2020
Porterfield, Olivier, Bhalachandra, Prins (br0460) 2013
Curtis-Maury, Blagojevic, Antonopoulos, Nikolopoulos (br0400) 2008; 19
Sridharan, Gupta, Sohi (br0420) 2014
Raasch, Reinhardt (br0500) 2003
Schwarzrock, Rocha, Beck, Lorenzon (br0600) 2020
Dashti (10.1016/j.jpdc.2025.105145_br0020) 2013; 41
Schwarzrock (10.1016/j.jpdc.2025.105145_br0200) 2021; 25
Diener (10.1016/j.jpdc.2025.105145_br0320) 2015
Cruz (10.1016/j.jpdc.2025.105145_br0270) 2015
Suleman (10.1016/j.jpdc.2025.105145_br0040) 2008; 43
McCalpin (10.1016/j.jpdc.2025.105145_br0660) 1995
Curtis-Maury (10.1016/j.jpdc.2025.105145_br0390) 2006
Seo (10.1016/j.jpdc.2025.105145_br0630) 2011
Che (10.1016/j.jpdc.2025.105145_br0640) 2009
Diener (10.1016/j.jpdc.2025.105145_br0030) 2015; 88
Bari (10.1016/j.jpdc.2025.105145_br0190) 2016
Diener (10.1016/j.jpdc.2025.105145_br0070) 2015; 27
Brooks (10.1016/j.jpdc.2025.105145_br0210) 2000; 20
Corbet (10.1016/j.jpdc.2025.105145_br0130)
Porterfield (10.1016/j.jpdc.2025.105145_br0460) 2013
Marathe (10.1016/j.jpdc.2025.105145_br0470) 2015
Schwarzrock (10.1016/j.jpdc.2025.105145_br0560) 2020
Schwarzrock (10.1016/j.jpdc.2025.105145_br0570) 2022
Eichenberger (10.1016/j.jpdc.2025.105145_br0230) 2012
Rocha (10.1016/j.jpdc.2025.105145_br0690) 2024
Subramanian (10.1016/j.jpdc.2025.105145_br0510) 2013
Gureya (10.1016/j.jpdc.2025.105145_br0010) 2020
Wang (10.1016/j.jpdc.2025.105145_br0360) 2016
De Sensi (10.1016/j.jpdc.2025.105145_br0350) 2016
Diener (10.1016/j.jpdc.2025.105145_br0110) 2013
Curtis-Maury (10.1016/j.jpdc.2025.105145_br0400) 2008; 19
Tam (10.1016/j.jpdc.2025.105145_br0080) 2007; vol. 41
Diener (10.1016/j.jpdc.2025.105145_br0150) 2014
Raasch (10.1016/j.jpdc.2025.105145_br0500) 2003
Bailey (10.1016/j.jpdc.2025.105145_br0520) 1991
Schwarzrock (10.1016/j.jpdc.2025.105145_br0600) 2020
Schwarzrock (10.1016/j.jpdc.2025.105145_br0610) 2020
Marathe (10.1016/j.jpdc.2025.105145_br0310) 2010; 70
Villavieja (10.1016/j.jpdc.2025.105145_br0580) 2011
Diener (10.1016/j.jpdc.2025.105145_br0220) 2016; 49
Jeannot (10.1016/j.jpdc.2025.105145_br0260) 2013; 25
Trahay (10.1016/j.jpdc.2025.105145_br0330) 2018
Broquedis (10.1016/j.jpdc.2025.105145_br0140) 2010
Papadimitriou (10.1016/j.jpdc.2025.105145_br0120) 2019
Lorenzon (10.1016/j.jpdc.2025.105145_br0050) 2018; 30
Scravaglieri (10.1016/j.jpdc.2025.105145_br0490) 2023; 180
Petersen (10.1016/j.jpdc.2025.105145_br0650) 2004
Pusukuri (10.1016/j.jpdc.2025.105145_br0340) 2011
Jung (10.1016/j.jpdc.2025.105145_br0370) 2005
Radojković (10.1016/j.jpdc.2025.105145_br0550) 2015; 65
Linux (10.1016/j.jpdc.2025.105145_br0290)
Constantinou (10.1016/j.jpdc.2025.105145_br0590) 2005; 33
Linux (10.1016/j.jpdc.2025.105145_br0250)
Popov (10.1016/j.jpdc.2025.105145_br0060) 2019
Lepers (10.1016/j.jpdc.2025.105145_br0160) 2015
de A. Rocha (10.1016/j.jpdc.2025.105145_br0680) 2022
Cruz (10.1016/j.jpdc.2025.105145_br0100) 2015; 27
Li (10.1016/j.jpdc.2025.105145_br0170) 2006
Lee (10.1016/j.jpdc.2025.105145_br0380) 2010
Quinn (10.1016/j.jpdc.2025.105145_br0530) 2004
Shafik (10.1016/j.jpdc.2025.105145_br0430) 2015
Kadosh (10.1016/j.jpdc.2025.105145_br0620) 2023
Alessi (10.1016/j.jpdc.2025.105145_br0180) 2015
Kleen (10.1016/j.jpdc.2025.105145_br0300)
De Sensi (10.1016/j.jpdc.2025.105145_br0440) 2016; 13
Li (10.1016/j.jpdc.2025.105145_br0410) 2010
Sridharan (10.1016/j.jpdc.2025.105145_br0420) 2014
Chadha (10.1016/j.jpdc.2025.105145_br0450) 2012
Cruz (10.1016/j.jpdc.2025.105145_br0090) 2012
Sánchez Barrera (10.1016/j.jpdc.2025.105145_br0480) 2020
Dongarra (10.1016/j.jpdc.2025.105145_br0670) 2015
Achermann (10.1016/j.jpdc.2025.105145_br0280) 2020
Denoyelle (10.1016/j.jpdc.2025.105145_br0540) 2018; 30
References_xml – start-page: 277
  year: 2015
  end-page: 289
  ident: br0160
  article-title: Thread and memory placement on NUMA systems: asymmetry matters
  publication-title: 2015 USENIX Annual Technical Conference
– year: 2004
  ident: br0530
  article-title: Parallel Programming in C with MPI and OpenMP
– year: 2021
  ident: br0250
  article-title: The Linux kernel documentation: Linux scheduler
– volume: 19
  start-page: 1396
  year: 2008
  end-page: 1410
  ident: br0400
  article-title: Prediction-based power-performance adaptation of multithreaded scientific codes
  publication-title: IEEE Trans. Parallel Distrib. Syst.
– start-page: 236
  year: 2005
  end-page: 246
  ident: br0370
  article-title: Adaptive execution techniques for smt multiprocessor architectures
  publication-title: Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
– start-page: 1
  year: 2020
  end-page: 26
  ident: br0560
  article-title: Dynamic concurrency throttling on NUMA systems and data migration impacts
  publication-title: Des. Autom. Embed. Syst.
– volume: 25
  start-page: 135
  year: 2021
  end-page: 160
  ident: br0200
  article-title: Dynamic concurrency throttling on NUMA systems and data migration impacts
  publication-title: Des. Autom. Embed. Syst.
– volume: 30
  start-page: 1374
  year: 2018
  end-page: 1389
  ident: br0540
  article-title: Modeling non-uniform memory access on large compute nodes with the cache-aware roofline model
  publication-title: IEEE Trans. Parallel Distrib. Syst.
– start-page: 340
  year: 2011
  end-page: 349
  ident: br0580
  article-title: Didi: mitigating the performance impact of tlb shootdowns using a shared tlb directory
  publication-title: 2011 International Conference on Parallel Architectures and Compilation Techniques
– start-page: 116
  year: 2011
  end-page: 125
  ident: br0340
  article-title: Thread reinforcer: dynamically determining number of threads via os level monitoring
  publication-title: 2011 IEEE International Symposium on Workload Characterization
– start-page: 169
  year: 2014
  end-page: 180
  ident: br0420
  article-title: Adaptive, efficient, parallel execution of parallel programs
  publication-title: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation
– year: 2004
  ident: br0300
  article-title: An NUMA api for Linux
– start-page: 884
  year: 2013
  end-page: 891
  ident: br0460
  article-title: Power measurement and concurrency throttling for energy reduction in openmp programs
  publication-title: 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum
– start-page: 137
  year: 2011
  end-page: 148
  ident: br0630
  article-title: Performance characterization of the nas parallel benchmarks in opencl
  publication-title: 2011 IEEE International Symposium on Workload Characterization
– start-page: 461
  year: 2016
  end-page: 470
  ident: br0190
  article-title: Arcs: adaptive runtime configuration selection for power-constrained openmp applications
  publication-title: 2016 IEEE International Conference on Cluster Computing
– start-page: 700
  year: 2013
  end-page: 711
  ident: br0110
  article-title: Communication-based mapping using shared pages
  publication-title: 2013 IEEE 27th International Parallel and Distributed Processing Symposium
– year: Mar 2012
  ident: br0130
  article-title: Toward better NUMA scheduling
– volume: 43
  start-page: 277
  year: 2008
  end-page: 286
  ident: br0040
  article-title: Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on cmps
  publication-title: ACM SIGPLAN Not.
– start-page: 1
  year: 2010
  end-page: 10
  ident: br0140
  article-title: Structuring the execution of openmp applications for multicore architectures
  publication-title: 2010 IEEE International Symposium on Parallel and Distributed Processing
– volume: 70
  start-page: 1204
  year: 2010
  end-page: 1219
  ident: br0310
  article-title: Feedback-directed page placement for ccnuma via hardware-generated memory traces
  publication-title: J. Parallel Distrib. Comput.
– start-page: 44
  year: 2009
  end-page: 54
  ident: br0640
  article-title: Rodinia: a benchmark suite for heterogeneous computing
  publication-title: 2009 IEEE International Symposium on Workload Characterization
– year: 1995
  ident: br0660
  article-title: Memory bandwidth and machine balance in current high performance computers
  publication-title: IEEE Computer Society Technical Committee on Computer Architecture Newsletter 2 (19–25)
– volume: vol. 41
  start-page: 47
  year: 2007
  end-page: 58
  ident: br0080
  article-title: Thread Clustering: Sharing-Aware Scheduling on smp-cmp-smt Multiprocessors
  publication-title: ACM SIGOPS Operating Systems Review
– start-page: 200
  year: 2016
  end-page: 207
  ident: br0350
  article-title: Predicting performance and power consumption of parallel applications
  publication-title: 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing
– start-page: 157
  year: 2006
  end-page: 166
  ident: br0390
  article-title: Online power-performance adaptation of multithreaded programs using hardware event-based prediction
  publication-title: Proceedings of the 20th Annual International Conference on Supercomputing
– volume: 65
  start-page: 256
  year: 2015
  end-page: 269
  ident: br0550
  article-title: Thread assignment in multicore/multithreaded processors: a statistical approach
  publication-title: IEEE Trans. Comput.
– start-page: 15
  year: 2012
  end-page: 28
  ident: br0230
  article-title: The design of openmp thread affinity
  publication-title: International Workshop on OpenMP
– start-page: 219
  year: 2015
  end-page: 232
  ident: br0180
  article-title: Application-level energy awareness for openmp
  publication-title: International Workshop on OpenMP
– start-page: 419
  year: 2016
  end-page: 431
  ident: br0360
  article-title: Predicting the memory bandwidth and optimal core allocations for multi-threaded applications on large-scale NUMA machines
  publication-title: 2016 IEEE International Symposium on High Performance Computer Architecture
– volume: 20
  start-page: 26
  year: 2000
  end-page: 44
  ident: br0210
  article-title: Power-aware microarchitecture: design and modeling challenges for next-generation microprocessors
  publication-title: IEEE MICRO
– start-page: 143
  year: 2024
  end-page: 148
  ident: br0690
  article-title: Enhancing graph execution for performance and energy efficiency on NUMA machines
  publication-title: 2024 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)
– start-page: 239
  year: 2020
  end-page: 246
  ident: br0600
  article-title: Effective exploration of thread throttling and thread/page mapping on NUMA systems
  publication-title: 2020 IEEE 22nd International Conference on High Performance Computing and Communications; IEEE 18th International Conference on Smart City; IEEE 6th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)
– start-page: 9
  year: 2015
  end-page: 16
  ident: br0320
  article-title: Locality vs. balance: exploring data mapping policies on NUMA systems
  publication-title: 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing
– start-page: 19
  year: 2018
  ident: br0330
  article-title: NumaMMA: NUMA memory analyzer
  publication-title: Proceedings of the 47th International Conference on Parallel Processing
– start-page: 962
  year: 2022
  end-page: 971
  ident: br0570
  article-title: Smoothing on dynamic concurrency throttling
  publication-title: 2022 IEEE International Parallel and Distributed Processing Symposium Workshops
– start-page: 546
  year: 2020
  end-page: 556
  ident: br0010
  article-title: Bandwidth-aware page placement in NUMA
  publication-title: 2020 IEEE International Parallel and Distributed Processing Symposium
– start-page: 1
  year: 2010
  end-page: 12
  ident: br0410
  article-title: Hybrid mpi/openmp power-aware computing
  publication-title: 2010 IEEE International Symposium on Parallel & Distributed Processing
– start-page: 133
  year: 2019
  end-page: 146
  ident: br0120
  article-title: Adaptive voltage/frequency scaling and core allocation for balanced energy and performance on multicore cpus
  publication-title: 2019 IEEE International Symposium on High Performance Computer Architecture
– volume: 33
  start-page: 80
  year: 2005
  end-page: 91
  ident: br0590
  article-title: Performance implications of single thread migration on a chip multi-core
  publication-title: Comput. Archit. News
– volume: 30
  start-page: 1007
  year: 2018
  end-page: 1021
  ident: br0050
  article-title: Aurora: seamless optimization of openmp applications
  publication-title: IEEE Trans. Parallel Distrib. Syst.
– start-page: 270
  year: 2010
  end-page: 279
  ident: br0380
  article-title: Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications
  publication-title: Proceedings of the 37th Annual International Symposium on Computer Architecture
– start-page: 639
  year: 2013
  end-page: 650
  ident: br0510
  article-title: Mise: providing performance predictability and improving fairness in shared main memory systems
  publication-title: 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)
– start-page: 1
  year: 2023
  end-page: 7
  ident: br0620
  article-title: Quantifying openmp: statistical insights into usage and adoption
  publication-title: 2023 IEEE High Performance Extreme Computing Conference (HPEC)
– volume: 180
  year: 2023
  ident: br0490
  article-title: Optimizing performance and energy across problem sizes through a search space exploration and machine learning
  publication-title: J. Parallel Distrib. Comput.
– volume: 27
  start-page: 2653
  year: 2015
  end-page: 2666
  ident: br0070
  article-title: Kernel-based thread and data mapping for improved memory affinity
  publication-title: IEEE Trans. Parallel Distrib. Syst.
– start-page: 342
  year: 2019
  end-page: 353
  ident: br0060
  article-title: Efficient thread/page/parallelism autotuning for NUMA systems
  publication-title: Proceedings of the ACM International Conference on Supercomputing
– start-page: 532
  year: 2012
  end-page: 543
  ident: br0090
  article-title: Using the translation lookaside buffer to map threads in parallel applications based on shared memory
  publication-title: 2012 IEEE 26th International Parallel and Distributed Processing Symposium
– start-page: 141
  year: 2012
  end-page: 150
  ident: br0450
  article-title: When less is more (limo): controlled parallelism for improved efficiency
  publication-title: Proceedings of the 2012 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
– volume: 49
  start-page: 1
  year: 2016
  end-page: 38
  ident: br0220
  article-title: Affinity-based thread and data mapping in shared memory systems
  publication-title: ACM Comput. Surv.
– start-page: 1
  year: 2020
  end-page: 13
  ident: br0480
  article-title: Modeling and optimizing NUMA effects and prefetching with machine learning
  publication-title: Proceedings of the 34th ACM International Conference on Supercomputing
– start-page: 1027
  year: 2022
  end-page: 1032
  ident: br0680
  article-title: Using machine learning to optimize graph execution on NUMA machines
  publication-title: Proceedings of the 59th ACM/IEEE Design Automation Conference
– start-page: 15
  year: 2003
  end-page: 25
  ident: br0500
  article-title: The impact of resource partitioning on smt processors
  publication-title: 2003 12th International Conference on Parallel Architectures and Compilation Techniques
– volume: 27
  start-page: 4970
  year: 2015
  end-page: 4992
  ident: br0100
  article-title: Communication-aware thread mapping using the translation lookaside buffer
  publication-title: Concurr. Comput., Pract. Exp.
– start-page: 158
  year: 1991
  end-page: 165
  ident: br0520
  article-title: The nas parallel benchmarks summary and preliminary results
  publication-title: Supercomputing'91: Proceedings of the 1991 ACM/IEEE Conference on Supercomputing
– volume: 13
  start-page: 1
  year: 2016
  end-page: 25
  ident: br0440
  article-title: A reconfiguration algorithm for power-aware parallel applications
  publication-title: ACM Trans. Archit. Code Optim.
– year: 2022
  ident: br0290
  article-title: The Linux kernel user's and administrator's guide: NUMA memory policy
– start-page: 394
  year: 2015
  end-page: 408
  ident: br0470
  article-title: A run-time system for power-constrained hpc applications
  publication-title: International Conference on High Performance Computing
– volume: 25
  start-page: 993
  year: 2013
  end-page: 1002
  ident: br0260
  article-title: Process placement in multicore clusters: algorithmic issues and practical techniques
  publication-title: IEEE Trans. Parallel Distrib. Syst.
– year: 2015
  ident: br0670
  article-title: Hpcg benchmark: a new metric for ranking high performance computing systems
– year: 2020
  ident: br0610
  article-title: A runtime and non-intrusive approach to optimize edp by tuning threads and cpu frequency for openmp applications
  publication-title: IEEE Trans. Parallel Distrib. Syst.
– start-page: 277
  year: 2014
  end-page: 288
  ident: br0150
  article-title: kmaf: automatic kernel-level management of thread and data affinity
  publication-title: Proceedings of the 23rd International Conference on Parallel Architectures and Compilation
– start-page: 77
  year: 2006
  end-page: 87
  ident: br0170
  article-title: Dynamic power-performance adaptation of parallel computation on chip multiprocessors
  publication-title: The Twelfth International Symposium on High-Performance Computer Architecture, 2006
– start-page: 19
  year: 2015
  end-page: 24
  ident: br0430
  article-title: Adaptive energy minimization of openmp parallel applications on many-core systems
  publication-title: Proceedings of the 6th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures
– volume: 41
  start-page: 381
  year: 2013
  end-page: 394
  ident: br0020
  article-title: Traffic management: a holistic approach to memory placement on NUMA systems
  publication-title: Comput. Archit. News
– year: 2004
  ident: br0650
  article-title: Introduction to Parallel Computing: A Practical Guide with Examples in C
  publication-title: Oxford Texts in Applied and Engineering Mathematics
– volume: 88
  start-page: 18
  year: 2015
  end-page: 36
  ident: br0030
  article-title: Characterizing communication and page usage of parallel applications for thread and data mapping
  publication-title: Perform. Eval.
– start-page: 207
  year: 2015
  end-page: 214
  ident: br0270
  article-title: An efficient algorithm for communication-based task mapping
  publication-title: 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing
– start-page: 283
  year: 2020
  end-page: 300
  ident: br0280
  article-title: Mitosis: transparently self-replicating page-tables for large-memory machines
  publication-title: Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)
– start-page: 133
  year: 2019
  ident: 10.1016/j.jpdc.2025.105145_br0120
  article-title: Adaptive voltage/frequency scaling and core allocation for balanced energy and performance on multicore cpus
– volume: 25
  start-page: 135
  year: 2021
  ident: 10.1016/j.jpdc.2025.105145_br0200
  article-title: Dynamic concurrency throttling on NUMA systems and data migration impacts
  publication-title: Des. Autom. Embed. Syst.
  doi: 10.1007/s10617-020-09243-5
– volume: 27
  start-page: 2653
  issue: 9
  year: 2015
  ident: 10.1016/j.jpdc.2025.105145_br0070
  article-title: Kernel-based thread and data mapping for improved memory affinity
  publication-title: IEEE Trans. Parallel Distrib. Syst.
  doi: 10.1109/TPDS.2015.2504985
– volume: 43
  start-page: 277
  issue: 3
  year: 2008
  ident: 10.1016/j.jpdc.2025.105145_br0040
  article-title: Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on cmps
  publication-title: ACM SIGPLAN Not.
  doi: 10.1145/1353536.1346317
– year: 2015
  ident: 10.1016/j.jpdc.2025.105145_br0670
– volume: 180
  year: 2023
  ident: 10.1016/j.jpdc.2025.105145_br0490
  article-title: Optimizing performance and energy across problem sizes through a search space exploration and machine learning
  publication-title: J. Parallel Distrib. Comput.
  doi: 10.1016/j.jpdc.2023.104720
– year: 2004
  ident: 10.1016/j.jpdc.2025.105145_br0530
– start-page: 1
  year: 2020
  ident: 10.1016/j.jpdc.2025.105145_br0560
  article-title: Dynamic concurrency throttling on NUMA systems and data migration impacts
  publication-title: Des. Autom. Embed. Syst.
– year: 2004
  ident: 10.1016/j.jpdc.2025.105145_br0650
  article-title: Introduction to Parallel Computing: A Practical Guide with Examples in C
  doi: 10.1093/oso/9780198515760.001.0001
– start-page: 277
  year: 2015
  ident: 10.1016/j.jpdc.2025.105145_br0160
  article-title: Thread and memory placement on NUMA systems: asymmetry matters
– volume: 19
  start-page: 1396
  issue: 10
  year: 2008
  ident: 10.1016/j.jpdc.2025.105145_br0400
  article-title: Prediction-based power-performance adaptation of multithreaded scientific codes
  publication-title: IEEE Trans. Parallel Distrib. Syst.
  doi: 10.1109/TPDS.2007.70804
– start-page: 884
  year: 2013
  ident: 10.1016/j.jpdc.2025.105145_br0460
  article-title: Power measurement and concurrency throttling for energy reduction in openmp programs
– start-page: 277
  year: 2014
  ident: 10.1016/j.jpdc.2025.105145_br0150
  article-title: kmaf: automatic kernel-level management of thread and data affinity
– volume: 33
  start-page: 80
  issue: 4
  year: 2005
  ident: 10.1016/j.jpdc.2025.105145_br0590
  article-title: Performance implications of single thread migration on a chip multi-core
  publication-title: Comput. Archit. News
  doi: 10.1145/1105734.1105745
– ident: 10.1016/j.jpdc.2025.105145_br0250
– start-page: 239
  year: 2020
  ident: 10.1016/j.jpdc.2025.105145_br0600
  article-title: Effective exploration of thread throttling and thread/page mapping on NUMA systems
– start-page: 1
  year: 2010
  ident: 10.1016/j.jpdc.2025.105145_br0410
  article-title: Hybrid mpi/openmp power-aware computing
– start-page: 44
  year: 2009
  ident: 10.1016/j.jpdc.2025.105145_br0640
  article-title: Rodinia: a benchmark suite for heterogeneous computing
– start-page: 19
  year: 2018
  ident: 10.1016/j.jpdc.2025.105145_br0330
  article-title: NumaMMA: NUMA memory analyzer
– start-page: 283
  year: 2020
  ident: 10.1016/j.jpdc.2025.105145_br0280
  article-title: Mitosis: transparently self-replicating page-tables for large-memory machines
– start-page: 962
  year: 2022
  ident: 10.1016/j.jpdc.2025.105145_br0570
  article-title: Smoothing on dynamic concurrency throttling
– start-page: 1
  year: 2023
  ident: 10.1016/j.jpdc.2025.105145_br0620
  article-title: Quantifying openmp: statistical insights into usage and adoption
– ident: 10.1016/j.jpdc.2025.105145_br0300
– start-page: 19
  year: 2015
  ident: 10.1016/j.jpdc.2025.105145_br0430
  article-title: Adaptive energy minimization of openmp parallel applications on many-core systems
– volume: 65
  start-page: 256
  issue: 1
  year: 2015
  ident: 10.1016/j.jpdc.2025.105145_br0550
  article-title: Thread assignment in multicore/multithreaded processors: a statistical approach
  publication-title: IEEE Trans. Comput.
  doi: 10.1109/TC.2015.2417533
– volume: 70
  start-page: 1204
  issue: 12
  year: 2010
  ident: 10.1016/j.jpdc.2025.105145_br0310
  article-title: Feedback-directed page placement for ccnuma via hardware-generated memory traces
  publication-title: J. Parallel Distrib. Comput.
  doi: 10.1016/j.jpdc.2010.08.015
– volume: 41
  start-page: 381
  issue: 1
  year: 2013
  ident: 10.1016/j.jpdc.2025.105145_br0020
  article-title: Traffic management: a holistic approach to memory placement on NUMA systems
  publication-title: Comput. Archit. News
  doi: 10.1145/2490301.2451157
– start-page: 169
  year: 2014
  ident: 10.1016/j.jpdc.2025.105145_br0420
  article-title: Adaptive, efficient, parallel execution of parallel programs
– volume: 30
  start-page: 1007
  issue: 5
  year: 2018
  ident: 10.1016/j.jpdc.2025.105145_br0050
  article-title: Aurora: seamless optimization of openmp applications
  publication-title: IEEE Trans. Parallel Distrib. Syst.
  doi: 10.1109/TPDS.2018.2872992
– start-page: 461
  year: 2016
  ident: 10.1016/j.jpdc.2025.105145_br0190
  article-title: Arcs: adaptive runtime configuration selection for power-constrained openmp applications
– start-page: 340
  year: 2011
  ident: 10.1016/j.jpdc.2025.105145_br0580
  article-title: Didi: mitigating the performance impact of tlb shootdowns using a shared tlb directory
– start-page: 342
  year: 2019
  ident: 10.1016/j.jpdc.2025.105145_br0060
  article-title: Efficient thread/page/parallelism autotuning for NUMA systems
– start-page: 158
  year: 1991
  ident: 10.1016/j.jpdc.2025.105145_br0520
  article-title: The nas parallel benchmarks summary and preliminary results
– start-page: 394
  year: 2015
  ident: 10.1016/j.jpdc.2025.105145_br0470
  article-title: A run-time system for power-constrained hpc applications
– volume: 88
  start-page: 18
  year: 2015
  ident: 10.1016/j.jpdc.2025.105145_br0030
  article-title: Characterizing communication and page usage of parallel applications for thread and data mapping
  publication-title: Perform. Eval.
  doi: 10.1016/j.peva.2015.03.001
– start-page: 15
  year: 2012
  ident: 10.1016/j.jpdc.2025.105145_br0230
  article-title: The design of openmp thread affinity
– start-page: 143
  year: 2024
  ident: 10.1016/j.jpdc.2025.105145_br0690
  article-title: Enhancing graph execution for performance and energy efficiency on NUMA machines
– start-page: 200
  year: 2016
  ident: 10.1016/j.jpdc.2025.105145_br0350
  article-title: Predicting performance and power consumption of parallel applications
– start-page: 532
  year: 2012
  ident: 10.1016/j.jpdc.2025.105145_br0090
  article-title: Using the translation lookaside buffer to map threads in parallel applications based on shared memory
– start-page: 700
  year: 2013
  ident: 10.1016/j.jpdc.2025.105145_br0110
  article-title: Communication-based mapping using shared pages
– start-page: 207
  year: 2015
  ident: 10.1016/j.jpdc.2025.105145_br0270
  article-title: An efficient algorithm for communication-based task mapping
– ident: 10.1016/j.jpdc.2025.105145_br0130
– year: 2020
  ident: 10.1016/j.jpdc.2025.105145_br0610
  article-title: A runtime and non-intrusive approach to optimize edp by tuning threads and cpu frequency for openmp applications
  publication-title: IEEE Trans. Parallel Distrib. Syst.
– start-page: 15
  year: 2003
  ident: 10.1016/j.jpdc.2025.105145_br0500
  article-title: The impact of resource partitioning on smt processors
– start-page: 639
  year: 2013
  ident: 10.1016/j.jpdc.2025.105145_br0510
  article-title: Mise: providing performance predictability and improving fairness in shared main memory systems
– volume: 49
  start-page: 1
  issue: 4
  year: 2016
  ident: 10.1016/j.jpdc.2025.105145_br0220
  article-title: Affinity-based thread and data mapping in shared memory systems
  publication-title: ACM Comput. Surv.
  doi: 10.1145/3006385
– start-page: 270
  year: 2010
  ident: 10.1016/j.jpdc.2025.105145_br0380
  article-title: Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications
– year: 1995
  ident: 10.1016/j.jpdc.2025.105145_br0660
  article-title: Memory bandwidth and machine balance in current high performance computers
– volume: 25
  start-page: 993
  issue: 4
  year: 2013
  ident: 10.1016/j.jpdc.2025.105145_br0260
  article-title: Process placement in multicore clusters: algorithmic issues and practical techniques
  publication-title: IEEE Trans. Parallel Distrib. Syst.
  doi: 10.1109/TPDS.2013.104
– volume: 13
  start-page: 1
  issue: 4
  year: 2016
  ident: 10.1016/j.jpdc.2025.105145_br0440
  article-title: A reconfiguration algorithm for power-aware parallel applications
  publication-title: ACM Trans. Archit. Code Optim.
  doi: 10.1145/3004054
– start-page: 1
  year: 2020
  ident: 10.1016/j.jpdc.2025.105145_br0480
  article-title: Modeling and optimizing NUMA effects and prefetching with machine learning
– start-page: 77
  year: 2006
  ident: 10.1016/j.jpdc.2025.105145_br0170
  article-title: Dynamic power-performance adaptation of parallel computation on chip multiprocessors
– start-page: 157
  year: 2006
  ident: 10.1016/j.jpdc.2025.105145_br0390
  article-title: Online power-performance adaptation of multithreaded programs using hardware event-based prediction
– start-page: 141
  year: 2012
  ident: 10.1016/j.jpdc.2025.105145_br0450
  article-title: When less is more (limo): controlled parallelism for improved efficiency
– start-page: 1027
  year: 2022
  ident: 10.1016/j.jpdc.2025.105145_br0680
  article-title: Using machine learning to optimize graph execution on NUMA machines
– start-page: 419
  year: 2016
  ident: 10.1016/j.jpdc.2025.105145_br0360
  article-title: Predicting the memory bandwidth and optimal core allocations for multi-threaded applications on large-scale NUMA machines
– volume: 27
  start-page: 4970
  issue: 17
  year: 2015
  ident: 10.1016/j.jpdc.2025.105145_br0100
  article-title: Communication-aware thread mapping using the translation lookaside buffer
  publication-title: Concurr. Comput., Pract. Exp.
  doi: 10.1002/cpe.3487
– start-page: 236
  year: 2005
  ident: 10.1016/j.jpdc.2025.105145_br0370
  article-title: Adaptive execution techniques for smt multiprocessor architectures
– start-page: 9
  year: 2015
  ident: 10.1016/j.jpdc.2025.105145_br0320
  article-title: Locality vs. balance: exploring data mapping policies on NUMA systems
– start-page: 219
  year: 2015
  ident: 10.1016/j.jpdc.2025.105145_br0180
  article-title: Application-level energy awareness for openmp
– start-page: 116
  year: 2011
  ident: 10.1016/j.jpdc.2025.105145_br0340
  article-title: Thread reinforcer: dynamically determining number of threads via os level monitoring
– volume: 30
  start-page: 1374
  issue: 6
  year: 2018
  ident: 10.1016/j.jpdc.2025.105145_br0540
  article-title: Modeling non-uniform memory access on large compute nodes with the cache-aware roofline model
  publication-title: IEEE Trans. Parallel Distrib. Syst.
  doi: 10.1109/TPDS.2018.2883056
– start-page: 137
  year: 2011
  ident: 10.1016/j.jpdc.2025.105145_br0630
  article-title: Performance characterization of the nas parallel benchmarks in opencl
– start-page: 1
  year: 2010
  ident: 10.1016/j.jpdc.2025.105145_br0140
  article-title: Structuring the execution of openmp applications for multicore architectures
– volume: 20
  start-page: 26
  issue: 6
  year: 2000
  ident: 10.1016/j.jpdc.2025.105145_br0210
  article-title: Power-aware microarchitecture: design and modeling challenges for next-generation microprocessors
  publication-title: IEEE MICRO
  doi: 10.1109/40.888701
– volume: vol. 41
  start-page: 47
  year: 2007
  ident: 10.1016/j.jpdc.2025.105145_br0080
  article-title: Thread Clustering: Sharing-Aware Scheduling on smp-cmp-smt Multiprocessors
– start-page: 546
  year: 2020
  ident: 10.1016/j.jpdc.2025.105145_br0010
  article-title: Bandwidth-aware page placement in NUMA
– ident: 10.1016/j.jpdc.2025.105145_br0290
SSID ssj0011578
Score 2.4267454
Snippet Non-Uniform Memory Access (NUMA) systems are prevalent in HPC, where optimal thread-to-core allocation and page placement are crucial for enhancing performance...
SourceID crossref
elsevier
SourceType Index Database
Publisher
StartPage 105145
SubjectTerms Dynamic concurrency throttling
NUMA systems
Page mapping
Parallel applications
Thread mapping
Title Integration framework for online thread throttling with thread and page mapping on NUMA systems
URI https://dx.doi.org/10.1016/j.jpdc.2025.105145
Volume 205
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3NS8MwFA9jXrz4Lc6PkYM3qWubZGmOQxyb4i462C0kbSIb2A2tV_9285pEFMSDp9LQB-3vJe-jvPd7CF0ynnLNhUqEoS5BYSpNtLVZop2-U0NKIXTL9jkbTub0bsEWHXQTe2GgrDLYfm_TW2sdVgYBzcFmuRw8gvPjBOb3QFzTNpNTymGXX398lXkAl0wRqTjh6dA442u8VpsKaAxzBuNuM2hp-s05fXM44z20EyJFPPIvs486pj5Au3EKAw6H8hDJaWB8cAhjG2utsAtGsafBwI3Tl6rgsm4aaD_H8Pc1Lqu6wmBV8IsCroZnJ4Vn84cR9iTPb0doPr59upkkYWxCUuaMNAkt9VAJd1QLokhmSgqMpxqI4HRmiU2VLt3X5y7xKjk1Lv2xJs8N57pQijFWkWPUrde1OUE4ZRV30BYGskDGlaYiTwkhQlfWMCF66CriJTeeHUPGsrGVBHQloCs9uj3EIqTyh46lM99_yJ3-U-4MbcOd7xw8R93m9d1cuBCi0f12j_TR1mh6P5l9AkhhxJo
linkProvider Elsevier
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV07a8MwED5CMrRL36XpU0O3YmJbVhSNITQ4TeKlCWQTki2XBJqE1v3_1UVyaaF06GSQObC_s-5h3X0HcM94yDUXKhAmsQkKU2GgyzIKtNV3aGguhN6xfWbddJ48LdiiAYO6FwbLKr3tdzZ9Z639Ssej2dkul51ndH6c4vwejGuwmbyF7FSsCa3-aJxmX4cJEXMGGdk4UcD3zrgyr9W2QCbDmOHE2wi7mn7zT998zvAIDnywSPrueY6hYdYncFgPYiB-X56CHHnSBwsyKetyK2LjUeKYMEhlVaYKvGyqCjvQCf6ArZfVuiBoWMirQrqGFytFsvm0TxzP8_sZzIePs0Ea-MkJQR4zWgVJrrtK2N3ao4pGJk-Q9FQjF5yOSlqGSuf27WObe-U8MTYDKk0cG851TynGWEHPobnerM0FkJAV3KLbM5gIMq50IuKQUip0URomRBsearzk1hFkyLpybCURXYnoSoduG1gNqfyhZmkt-B9yl_-Uu4O9dDadyMkoG1_BPt5xjYTX0KzePsyNjSgqfeu_mE8dkcdL
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Integration+framework+for+online+thread+throttling+with+thread+and+page+mapping+on+NUMA+systems&rft.jtitle=Journal+of+parallel+and+distributed+computing&rft.au=Schwarzrock%2C+Janaina&rft.au=Rocha%2C+Hiago+Mayk+G.+de+A.&rft.au=Lorenzon%2C+Arthur+F.&rft.au=de+Souza%2C+Samuel+Xavier&rft.date=2025-11-01&rft.pub=Elsevier+Inc&rft.issn=0743-7315&rft.volume=205&rft_id=info:doi/10.1016%2Fj.jpdc.2025.105145&rft.externalDocID=S0743731525001121
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0743-7315&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0743-7315&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0743-7315&client=summon