Adaptive parallel job scheduling with flexible coscheduling
Many scientific and high-performance computing applications consist of multiple processes running on different processors that communicate frequently. Because of their synchronization needs, these applications can suffer severe performance penalties if their processes are not all coscheduled to run...
Saved in:
Published in | IEEE transactions on parallel and distributed systems Vol. 16; no. 11; pp. 1066 - 1077 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.11.2005
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Many scientific and high-performance computing applications consist of multiple processes running on different processors that communicate frequently. Because of their synchronization needs, these applications can suffer severe performance penalties if their processes are not all coscheduled to run together. Two common approaches to coscheduling jobs are batch scheduling, wherein nodes are dedicated for the duration of the run, and gang scheduling, wherein time slicing is coordinated across processors. Both work well when jobs are load-balanced and make use of the entire parallel machine. However, these conditions are rarely met and most realistic workloads consequently suffer from both internal and external fragmentation, in which resources and processors are left idle because jobs cannot be packed with perfect efficiency. This situation leads to reduced utilization and suboptimal performance. Flexible coscheduling (FCS) addresses this problem by monitoring each job's computation granularity and communication pattern and scheduling jobs based on their synchronization and load-balancing requirements. In particular, jobs that do not require stringent synchronization are identified, and are not coscheduled; instead, these processes are used to reduce fragmentation. FCS has been fully implemented on top of the STORM resource manager on a 256-processor alpha cluster and compared to batch, gang, and implicit coscheduling algorithms. This paper describes in detail the implementation of FCS and its performance evaluation with a variety of workloads, including large-scale benchmarks, scientific applications, and dynamic workloads. The experimental results show that FCS saturates at higher loads than other algorithms (up to 54 percent higher in some cases), and displays lower response times and slowdown than the other algorithms in nearly all scenarios. |
---|---|
AbstractList | Many scientific and high-performance computing applications consist of multiple processes running on different processors that communicate frequently. Because of their synchronization needs, these applications can suffer severe performance penalties if their processes are not all coscheduled to run together. Two common approaches to coscheduling jobs are batch scheduling, wherein nodes are dedicated for the duration of the run, and gang scheduling, wherein time slicing is coordinated across processors. Both work well when jobs are load-balanced and make use of the entire parallel machine. However, these conditions are rarely met and most realistic workloads consequently suffer from both internal and external fragmentation, in which resources and processors are left idle because jobs cannot be packed with perfect efficiency. This situation leads to reduced utilization and suboptimal performance. Flexible coscheduling (FCS) addresses this problem by monitoring each job's computation granularity and communication pattern and scheduling jobs based on their synchronization and load-balancing requirements. In particular, jobs that do not require stringent synchronization are identified, and are not coscheduled; instead, these processes are used to reduce fragmentation. FCS has been fully implemented on top of the STORM resource manager on a 256-processor alpha cluster and compared to batch, gang, and implicit coscheduling algorithms. This paper describes in detail the implementation of FCS and its performance evaluation with a variety of workloads, including large-scale benchmarks, scientific applications, and dynamic workloads. The experimental results show that FCS saturates at higher loads than other algorithms (up to 54 percent higher in some cases), and displays lower response times and slowdown than the other algorithms in nearly all scenarios. Many scientific and high-performance computing applications consist of multiple processes running on different processors that communicate frequently. Because of their synchronization needs, these applications can suffer severe performance penalties if their processes are not all coscheduled to run together. |
Author | Feitelson, G. Fernandez, J. Frachtenberg, E. Petrini, F. |
Author_xml | – sequence: 1 givenname: E. surname: Frachtenberg fullname: Frachtenberg, E. organization: Comput. & Computational Sci. Div., Los Alamos Nat. Lab., NM, USA – sequence: 2 givenname: G. surname: Feitelson fullname: Feitelson, G. – sequence: 3 givenname: F. surname: Petrini fullname: Petrini, F. – sequence: 4 givenname: J. surname: Fernandez fullname: Fernandez, J. |
BookMark | eNp1kMtLAzEQh4NUsK0ePXlZPHjbmtduEjyV-oSCgvUcstmsTUk3Ndn18d-bUqFQ8DQD883Mj28EBq1vDQDnCE4QguJ68XL7OsEQFhNE4BEYoqLgOUacDFIPaZELjMQJGMW4ghDRAtIhuJnWatPZT5NtVFDOGZetfJVFvTR172z7nn3Zbpk1znzbyplM-_3oFBw3ykVz9lfH4O3-bjF7zOfPD0-z6TzXhKIurxVRrMGkFBCVDCOimSoYFtQoUUDNeUMrUmkmKl1C3PCmrhIjal7WSiCuyBhc7e5ugv_oTezk2kZtnFOt8X2UmAnBaMkTeHkArnwf2pRNCgwJYilDgvIdpIOPMZhGboJdq_AjEZRbj3LrUW49yuQx8eSA17ZTnfVtF5R1_25d7LasMWb_oUCUEkp-Aa1ngDQ |
CODEN | ITDSEO |
CitedBy_id | crossref_primary_10_7763_IJCTE_2009_V1_105 crossref_primary_10_1016_j_suscom_2012_03_002 crossref_primary_10_1016_j_jpdc_2008_02_009 crossref_primary_10_1007_s11390_007_9082_y crossref_primary_10_1016_j_simpat_2008_10_001 crossref_primary_10_1016_j_future_2015_04_005 crossref_primary_10_1109_TC_2006_206 crossref_primary_10_1007_s10951_015_0445_x crossref_primary_10_1007_s11227_006_0006_3 crossref_primary_10_11648_j_ajist_20240803_14 crossref_primary_10_1007_s10586_011_0193_4 crossref_primary_10_1109_TNSM_2011_012111_00004 |
Cites_doi | 10.1145/305619.305630 10.1109/TPDS.2003.1206505 10.1145/582034.582071 10.1007/BFb0053990 10.1109/IPDPS.2003.1213191 10.1007/3-540-63574-2_23 10.1109/FMPC.1999.750452 10.1145/1048935.1050204 10.1007/BFb0053978 10.1007/3-540-36180-4_4 10.1109/SC.2002.10057 10.1016/S0743-7315(03)00108-4 10.1145/380749.380764 10.1007/10968987_11 10.1109/ICPP.2002.1040918 10.1109/HPDC.2000.868653 10.1109/sc.2004.20 10.1109/40.988689 10.1145/79173.79181 10.1109/71.932708 10.1007/BFb0052218 10.1016/S0167-739X(03)00031-1 10.1109/ICPP.2001.952054 |
ContentType | Journal Article |
Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2005 |
Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2005 |
DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
DOI | 10.1109/TPDS.2005.130 |
DatabaseName | IEEE Xplore (IEEE) IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
DatabaseTitleList | Technology Research Database Computer and Information Systems Abstracts |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering Computer Science |
EISSN | 1558-2183 |
EndPage | 1077 |
ExternalDocumentID | 2581301301 10_1109_TPDS_2005_130 1514434 |
Genre | orig-research |
GroupedDBID | --Z -~X .DC 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABFSI ABQJQ ABVLG ACGFO ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD HZ~ H~9 ICLAB IEDLZ IFIPE IFJZH IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNI RNS RZB TN5 TWZ UHB VH1 AAYOK AAYXX CITATION RIG 7SC 7SP 8FD JQ2 L7M L~C L~D |
ID | FETCH-LOGICAL-c341t-da3a7f23690167213c7a57294ea950c88f4b3bc79bc602f8fdb13c9d86da918a3 |
IEDL.DBID | RIE |
ISSN | 1045-9219 |
IngestDate | Fri Jul 11 13:32:53 EDT 2025 Sun Jun 29 16:39:17 EDT 2025 Tue Jul 01 05:18:55 EDT 2025 Thu Apr 24 23:00:08 EDT 2025 Wed Aug 27 02:52:21 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 11 |
Language | English |
License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c341t-da3a7f23690167213c7a57294ea950c88f4b3bc79bc602f8fdb13c9d86da918a3 |
Notes | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 content type line 23 |
PQID | 920317369 |
PQPubID | 85437 |
PageCount | 12 |
ParticipantIDs | proquest_journals_920317369 crossref_primary_10_1109_TPDS_2005_130 proquest_miscellaneous_27997468 crossref_citationtrail_10_1109_TPDS_2005_130 ieee_primary_1514434 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2005-Nov. 2005-11-00 20051101 |
PublicationDateYYYYMMDD | 2005-11-01 |
PublicationDate_xml | – month: 11 year: 2005 text: 2005-Nov. |
PublicationDecade | 2000 |
PublicationPlace | New York |
PublicationPlace_xml | – name: New York |
PublicationTitle | IEEE transactions on parallel and distributed systems |
PublicationTitleAbbrev | TPDS |
PublicationYear | 2005 |
Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
References | ref13 ref12 ref15 ref14 ref11 ref10 (ref25) 2001 ref2 ref1 ref17 ref16 ref19 ref18 ref24 ref23 ref20 ref22 ref21 ref8 ref7 ref9 ref4 Culler (ref5) 1999 ref3 ref6 |
References_xml | – ident: ref17 doi: 10.1145/305619.305630 – ident: ref24 doi: 10.1109/TPDS.2003.1206505 – ident: ref11 doi: 10.1145/582034.582071 – ident: ref21 doi: 10.1007/BFb0053990 – ident: ref8 doi: 10.1109/IPDPS.2003.1213191 – ident: ref14 doi: 10.1007/3-540-63574-2_23 – ident: ref10 doi: 10.1109/FMPC.1999.750452 – ident: ref20 doi: 10.1145/1048935.1050204 – ident: ref6 doi: 10.1007/BFb0053978 – ident: ref22 doi: 10.1007/3-540-36180-4_4 – ident: ref9 doi: 10.1109/SC.2002.10057 – volume-title: Parallel Computer Architecture: A Hardware/Software Approach year: 1999 ident: ref5 – ident: ref15 doi: 10.1016/S0743-7315(03)00108-4 – ident: ref3 doi: 10.1145/380749.380764 – ident: ref7 doi: 10.1007/10968987_11 – ident: ref12 doi: 10.1109/ICPP.2002.1040918 – ident: ref1 doi: 10.1109/HPDC.2000.868653 – ident: ref4 doi: 10.1109/sc.2004.20 – ident: ref19 doi: 10.1109/40.988689 – volume-title: Technical Report DOE/DP/ASC-ATP-001, Nat’l Nuclear Security Agency year: 2001 ident: ref25 article-title: ASCI Technology Prospectus: Simulation and Computational Science – ident: ref23 doi: 10.1145/79173.79181 – ident: ref16 doi: 10.1109/71.932708 – ident: ref13 doi: 10.1007/BFb0052218 – ident: ref18 doi: 10.1016/S0167-739X(03)00031-1 – ident: ref2 doi: 10.1109/ICPP.2001.952054 |
SSID | ssj0014504 |
Score | 1.9704046 |
Snippet | Many scientific and high-performance computing applications consist of multiple processes running on different processors that communicate frequently. Because... |
SourceID | proquest crossref ieee |
SourceType | Aggregation Database Enrichment Source Index Database Publisher |
StartPage | 1066 |
SubjectTerms | Adaptive scheduling Algorithms Cluster computing Clustering algorithms Computer applications Delay Displays flexible coscheduling gang scheduling job scheduling Large-scale systems load balancing parallel architectures Parallel machines Processor scheduling Resource management Servers Storms Studies |
Title | Adaptive parallel job scheduling with flexible coscheduling |
URI | https://ieeexplore.ieee.org/document/1514434 https://www.proquest.com/docview/920317369 https://www.proquest.com/docview/27997468 |
Volume | 16 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEB5qT3rwURXrcw_iqalpstlk8SQ-EKEiaMFb2OdBQyPaXvz1ziRpKz7AW2CHZbO78975BuBYK-VdKG1gUxUGHA0jlIPOBH6AvOTCWIqMCpyHd-JmxG-fkqcW9Oa1MM656vGZ69Nnlcu3pZlSqOwUtRPnMV-CJXTc6lqtecaAJ1WrQPQukkAiGy7wNE8f7y8f6ujJgB47f9E_VUOVH1K4Ui3XazCcLap-UfLSn05033x8w2v876rXYbWxMdl5fSk2oOXGHVib9W9gDTt3YOULGOEmnJ1b9UrCjxEceFG4gj2XmqH3i9qIitYZxWyZJwRNXThmysXQFoyurx4vboKmtUJgUG1NAqtilfoopnZUAp3A2KQqQTubOyWT0GSZ5zrWJpXaiDDymbcaaaTNhFVykKl4G9rjcux2gGm04CJPMDJ-wJ3zMkMfJBE6ssILqUwXerMNz02DO07tL4q88j9CmdP5UDvMhHJrXTiZk7_WgBt_EW7SXi-I6m3uwt7sNPOGHd9zGaHsSvFvu3A0H0U-ouSIGrty-p5HqUTXSmS7v0-7B8s1aCsFX_ahPXmbugM0Ryb6sLqHn8XV3Tg |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1JT-MwFH5iOQxzYEeU1QfEiZQsjhOLE2JRWYqQpkjcIq8HiBo0tJf59fNekraIGSRukfxkObbf7vc9gCOtlHehtIHNVBhwNIxQDjoT-Ah5yYWJFDkVOPcfRO-J3z6nz3NwMq2Fcc7Vj89clz7rXL6tzJhCZaeonThP-Dwsot5Po6Zaa5oz4GndLBD9izSQyIgzRM3TwePlryZ-EtFz5w8aqG6p8o8crpXL9Qr0J8tq3pS8dscj3TV_PiE2fnfdq7DcWpnsvLkWazDnhuuwMungwFqGXoefH-AIN-Ds3Ko3En-MAMHL0pXspdIM_V_UR1S2zihqyzxhaOrSMVPNhjbh6fpqcNEL2uYKgUHFNQqsSlTm44QaUgl0AxOTqRQtbe6UTEOT557rRJtMaiPC2OfeaqSRNhdWyShXyRYsDKuh2wam0YaLPQHJ-Ig752WOXkgqdGyFF1KZDpxMNrwwLfI4NcAoi9oDCWVB50MNMVPKrnXgeEr-1kBufEW4QXs9I2q2uQO7k9MsWoZ8L2SM0ivDv-3A4XQUOYnSI2roqvF7EWcSnSuR7_x_2kP40Rv074v7m4e7XVhqIFwpFLMHC6PfY7ePxslIH9R38i949eCB |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Adaptive+parallel+job+scheduling+with+flexible+coscheduling&rft.jtitle=IEEE+transactions+on+parallel+and+distributed+systems&rft.au=Frachtenberg%2C+E&rft.au=Feitelson%2C+G&rft.au=Petrini%2C+F&rft.au=Fernandez%2C+J&rft.date=2005-11-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=1045-9219&rft.eissn=1558-2183&rft.volume=16&rft.issue=11&rft.spage=1066&rft_id=info:doi/10.1109%2FTPDS.2005.130&rft.externalDBID=NO_FULL_TEXT&rft.externalDocID=2581301301 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1045-9219&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1045-9219&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1045-9219&client=summon |