Adaptive parallel job scheduling with flexible coscheduling

Many scientific and high-performance computing applications consist of multiple processes running on different processors that communicate frequently. Because of their synchronization needs, these applications can suffer severe performance penalties if their processes are not all coscheduled to run...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on parallel and distributed systems Vol. 16; no. 11; pp. 1066 - 1077
Main Authors Frachtenberg, E., Feitelson, G., Petrini, F., Fernandez, J.
Format Journal Article
LanguageEnglish
Published New York IEEE 01.11.2005
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Many scientific and high-performance computing applications consist of multiple processes running on different processors that communicate frequently. Because of their synchronization needs, these applications can suffer severe performance penalties if their processes are not all coscheduled to run together. Two common approaches to coscheduling jobs are batch scheduling, wherein nodes are dedicated for the duration of the run, and gang scheduling, wherein time slicing is coordinated across processors. Both work well when jobs are load-balanced and make use of the entire parallel machine. However, these conditions are rarely met and most realistic workloads consequently suffer from both internal and external fragmentation, in which resources and processors are left idle because jobs cannot be packed with perfect efficiency. This situation leads to reduced utilization and suboptimal performance. Flexible coscheduling (FCS) addresses this problem by monitoring each job's computation granularity and communication pattern and scheduling jobs based on their synchronization and load-balancing requirements. In particular, jobs that do not require stringent synchronization are identified, and are not coscheduled; instead, these processes are used to reduce fragmentation. FCS has been fully implemented on top of the STORM resource manager on a 256-processor alpha cluster and compared to batch, gang, and implicit coscheduling algorithms. This paper describes in detail the implementation of FCS and its performance evaluation with a variety of workloads, including large-scale benchmarks, scientific applications, and dynamic workloads. The experimental results show that FCS saturates at higher loads than other algorithms (up to 54 percent higher in some cases), and displays lower response times and slowdown than the other algorithms in nearly all scenarios.
AbstractList Many scientific and high-performance computing applications consist of multiple processes running on different processors that communicate frequently. Because of their synchronization needs, these applications can suffer severe performance penalties if their processes are not all coscheduled to run together. Two common approaches to coscheduling jobs are batch scheduling, wherein nodes are dedicated for the duration of the run, and gang scheduling, wherein time slicing is coordinated across processors. Both work well when jobs are load-balanced and make use of the entire parallel machine. However, these conditions are rarely met and most realistic workloads consequently suffer from both internal and external fragmentation, in which resources and processors are left idle because jobs cannot be packed with perfect efficiency. This situation leads to reduced utilization and suboptimal performance. Flexible coscheduling (FCS) addresses this problem by monitoring each job's computation granularity and communication pattern and scheduling jobs based on their synchronization and load-balancing requirements. In particular, jobs that do not require stringent synchronization are identified, and are not coscheduled; instead, these processes are used to reduce fragmentation. FCS has been fully implemented on top of the STORM resource manager on a 256-processor alpha cluster and compared to batch, gang, and implicit coscheduling algorithms. This paper describes in detail the implementation of FCS and its performance evaluation with a variety of workloads, including large-scale benchmarks, scientific applications, and dynamic workloads. The experimental results show that FCS saturates at higher loads than other algorithms (up to 54 percent higher in some cases), and displays lower response times and slowdown than the other algorithms in nearly all scenarios.
Many scientific and high-performance computing applications consist of multiple processes running on different processors that communicate frequently. Because of their synchronization needs, these applications can suffer severe performance penalties if their processes are not all coscheduled to run together.
Author Feitelson, G.
Fernandez, J.
Frachtenberg, E.
Petrini, F.
Author_xml – sequence: 1
  givenname: E.
  surname: Frachtenberg
  fullname: Frachtenberg, E.
  organization: Comput. & Computational Sci. Div., Los Alamos Nat. Lab., NM, USA
– sequence: 2
  givenname: G.
  surname: Feitelson
  fullname: Feitelson, G.
– sequence: 3
  givenname: F.
  surname: Petrini
  fullname: Petrini, F.
– sequence: 4
  givenname: J.
  surname: Fernandez
  fullname: Fernandez, J.
BookMark eNp1kMtLAzEQh4NUsK0ePXlZPHjbmtduEjyV-oSCgvUcstmsTUk3Ndn18d-bUqFQ8DQD883Mj28EBq1vDQDnCE4QguJ68XL7OsEQFhNE4BEYoqLgOUacDFIPaZELjMQJGMW4ghDRAtIhuJnWatPZT5NtVFDOGZetfJVFvTR172z7nn3Zbpk1znzbyplM-_3oFBw3ykVz9lfH4O3-bjF7zOfPD0-z6TzXhKIurxVRrMGkFBCVDCOimSoYFtQoUUDNeUMrUmkmKl1C3PCmrhIjal7WSiCuyBhc7e5ugv_oTezk2kZtnFOt8X2UmAnBaMkTeHkArnwf2pRNCgwJYilDgvIdpIOPMZhGboJdq_AjEZRbj3LrUW49yuQx8eSA17ZTnfVtF5R1_25d7LasMWb_oUCUEkp-Aa1ngDQ
CODEN ITDSEO
CitedBy_id crossref_primary_10_7763_IJCTE_2009_V1_105
crossref_primary_10_1016_j_suscom_2012_03_002
crossref_primary_10_1016_j_jpdc_2008_02_009
crossref_primary_10_1007_s11390_007_9082_y
crossref_primary_10_1016_j_simpat_2008_10_001
crossref_primary_10_1016_j_future_2015_04_005
crossref_primary_10_1109_TC_2006_206
crossref_primary_10_1007_s10951_015_0445_x
crossref_primary_10_1007_s11227_006_0006_3
crossref_primary_10_11648_j_ajist_20240803_14
crossref_primary_10_1007_s10586_011_0193_4
crossref_primary_10_1109_TNSM_2011_012111_00004
Cites_doi 10.1145/305619.305630
10.1109/TPDS.2003.1206505
10.1145/582034.582071
10.1007/BFb0053990
10.1109/IPDPS.2003.1213191
10.1007/3-540-63574-2_23
10.1109/FMPC.1999.750452
10.1145/1048935.1050204
10.1007/BFb0053978
10.1007/3-540-36180-4_4
10.1109/SC.2002.10057
10.1016/S0743-7315(03)00108-4
10.1145/380749.380764
10.1007/10968987_11
10.1109/ICPP.2002.1040918
10.1109/HPDC.2000.868653
10.1109/sc.2004.20
10.1109/40.988689
10.1145/79173.79181
10.1109/71.932708
10.1007/BFb0052218
10.1016/S0167-739X(03)00031-1
10.1109/ICPP.2001.952054
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2005
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2005
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/TPDS.2005.130
DatabaseName IEEE Xplore (IEEE)
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Technology Research Database
Computer and Information Systems Abstracts
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 1558-2183
EndPage 1077
ExternalDocumentID 2581301301
10_1109_TPDS_2005_130
1514434
Genre orig-research
GroupedDBID --Z
-~X
.DC
0R~
29I
4.4
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABFSI
ABQJQ
ABVLG
ACGFO
ACIWK
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
E.L
EBS
EJD
HZ~
H~9
ICLAB
IEDLZ
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNI
RNS
RZB
TN5
TWZ
UHB
VH1
AAYOK
AAYXX
CITATION
RIG
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c341t-da3a7f23690167213c7a57294ea950c88f4b3bc79bc602f8fdb13c9d86da918a3
IEDL.DBID RIE
ISSN 1045-9219
IngestDate Fri Jul 11 13:32:53 EDT 2025
Sun Jun 29 16:39:17 EDT 2025
Tue Jul 01 05:18:55 EDT 2025
Thu Apr 24 23:00:08 EDT 2025
Wed Aug 27 02:52:21 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 11
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c341t-da3a7f23690167213c7a57294ea950c88f4b3bc79bc602f8fdb13c9d86da918a3
Notes ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
content type line 23
PQID 920317369
PQPubID 85437
PageCount 12
ParticipantIDs proquest_journals_920317369
crossref_primary_10_1109_TPDS_2005_130
proquest_miscellaneous_27997468
crossref_citationtrail_10_1109_TPDS_2005_130
ieee_primary_1514434
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2005-Nov.
2005-11-00
20051101
PublicationDateYYYYMMDD 2005-11-01
PublicationDate_xml – month: 11
  year: 2005
  text: 2005-Nov.
PublicationDecade 2000
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on parallel and distributed systems
PublicationTitleAbbrev TPDS
PublicationYear 2005
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref12
ref15
ref14
ref11
ref10
(ref25) 2001
ref2
ref1
ref17
ref16
ref19
ref18
ref24
ref23
ref20
ref22
ref21
ref8
ref7
ref9
ref4
Culler (ref5) 1999
ref3
ref6
References_xml – ident: ref17
  doi: 10.1145/305619.305630
– ident: ref24
  doi: 10.1109/TPDS.2003.1206505
– ident: ref11
  doi: 10.1145/582034.582071
– ident: ref21
  doi: 10.1007/BFb0053990
– ident: ref8
  doi: 10.1109/IPDPS.2003.1213191
– ident: ref14
  doi: 10.1007/3-540-63574-2_23
– ident: ref10
  doi: 10.1109/FMPC.1999.750452
– ident: ref20
  doi: 10.1145/1048935.1050204
– ident: ref6
  doi: 10.1007/BFb0053978
– ident: ref22
  doi: 10.1007/3-540-36180-4_4
– ident: ref9
  doi: 10.1109/SC.2002.10057
– volume-title: Parallel Computer Architecture: A Hardware/Software Approach
  year: 1999
  ident: ref5
– ident: ref15
  doi: 10.1016/S0743-7315(03)00108-4
– ident: ref3
  doi: 10.1145/380749.380764
– ident: ref7
  doi: 10.1007/10968987_11
– ident: ref12
  doi: 10.1109/ICPP.2002.1040918
– ident: ref1
  doi: 10.1109/HPDC.2000.868653
– ident: ref4
  doi: 10.1109/sc.2004.20
– ident: ref19
  doi: 10.1109/40.988689
– volume-title: Technical Report DOE/DP/ASC-ATP-001, Nat’l Nuclear Security Agency
  year: 2001
  ident: ref25
  article-title: ASCI Technology Prospectus: Simulation and Computational Science
– ident: ref23
  doi: 10.1145/79173.79181
– ident: ref16
  doi: 10.1109/71.932708
– ident: ref13
  doi: 10.1007/BFb0052218
– ident: ref18
  doi: 10.1016/S0167-739X(03)00031-1
– ident: ref2
  doi: 10.1109/ICPP.2001.952054
SSID ssj0014504
Score 1.9704046
Snippet Many scientific and high-performance computing applications consist of multiple processes running on different processors that communicate frequently. Because...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 1066
SubjectTerms Adaptive scheduling
Algorithms
Cluster computing
Clustering algorithms
Computer applications
Delay
Displays
flexible coscheduling
gang scheduling
job scheduling
Large-scale systems
load balancing
parallel architectures
Parallel machines
Processor scheduling
Resource management
Servers
Storms
Studies
Title Adaptive parallel job scheduling with flexible coscheduling
URI https://ieeexplore.ieee.org/document/1514434
https://www.proquest.com/docview/920317369
https://www.proquest.com/docview/27997468
Volume 16
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEB5qT3rwURXrcw_iqalpstlk8SQ-EKEiaMFb2OdBQyPaXvz1ziRpKz7AW2CHZbO78975BuBYK-VdKG1gUxUGHA0jlIPOBH6AvOTCWIqMCpyHd-JmxG-fkqcW9Oa1MM656vGZ69Nnlcu3pZlSqOwUtRPnMV-CJXTc6lqtecaAJ1WrQPQukkAiGy7wNE8f7y8f6ujJgB47f9E_VUOVH1K4Ui3XazCcLap-UfLSn05033x8w2v876rXYbWxMdl5fSk2oOXGHVib9W9gDTt3YOULGOEmnJ1b9UrCjxEceFG4gj2XmqH3i9qIitYZxWyZJwRNXThmysXQFoyurx4vboKmtUJgUG1NAqtilfoopnZUAp3A2KQqQTubOyWT0GSZ5zrWJpXaiDDymbcaaaTNhFVykKl4G9rjcux2gGm04CJPMDJ-wJ3zMkMfJBE6ssILqUwXerMNz02DO07tL4q88j9CmdP5UDvMhHJrXTiZk7_WgBt_EW7SXi-I6m3uwt7sNPOGHd9zGaHsSvFvu3A0H0U-ouSIGrty-p5HqUTXSmS7v0-7B8s1aCsFX_ahPXmbugM0Ryb6sLqHn8XV3Tg
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1JT-MwFH5iOQxzYEeU1QfEiZQsjhOLE2JRWYqQpkjcIq8HiBo0tJf59fNekraIGSRukfxkObbf7vc9gCOtlHehtIHNVBhwNIxQDjoT-Ah5yYWJFDkVOPcfRO-J3z6nz3NwMq2Fcc7Vj89clz7rXL6tzJhCZaeonThP-Dwsot5Po6Zaa5oz4GndLBD9izSQyIgzRM3TwePlryZ-EtFz5w8aqG6p8o8crpXL9Qr0J8tq3pS8dscj3TV_PiE2fnfdq7DcWpnsvLkWazDnhuuwMungwFqGXoefH-AIN-Ds3Ko3En-MAMHL0pXspdIM_V_UR1S2zihqyzxhaOrSMVPNhjbh6fpqcNEL2uYKgUHFNQqsSlTm44QaUgl0AxOTqRQtbe6UTEOT557rRJtMaiPC2OfeaqSRNhdWyShXyRYsDKuh2wam0YaLPQHJ-Ig752WOXkgqdGyFF1KZDpxMNrwwLfI4NcAoi9oDCWVB50MNMVPKrnXgeEr-1kBufEW4QXs9I2q2uQO7k9MsWoZ8L2SM0ivDv-3A4XQUOYnSI2roqvF7EWcSnSuR7_x_2kP40Rv074v7m4e7XVhqIFwpFLMHC6PfY7ePxslIH9R38i949eCB
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Adaptive+parallel+job+scheduling+with+flexible+coscheduling&rft.jtitle=IEEE+transactions+on+parallel+and+distributed+systems&rft.au=Frachtenberg%2C+E&rft.au=Feitelson%2C+G&rft.au=Petrini%2C+F&rft.au=Fernandez%2C+J&rft.date=2005-11-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=1045-9219&rft.eissn=1558-2183&rft.volume=16&rft.issue=11&rft.spage=1066&rft_id=info:doi/10.1109%2FTPDS.2005.130&rft.externalDBID=NO_FULL_TEXT&rft.externalDocID=2581301301
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1045-9219&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1045-9219&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1045-9219&client=summon