An experimental study about diskless checkpointing

Checkpointing and rollback recovery is a very effective technique to tolerate the occurrence of failures. Usually, the checkpoint data is saved in some diskfiles. However, in some situations the disk operation may result in a considerable performance overhead. Alternative solutions would make use of...

Full description

Saved in:
Bibliographic Details
Published inProceedings. 24th EUROMICRO Conference (Cat. No.98EX204) Vol. 1; pp. 395 - 402 vol.1
Main Authors Silva, L.M., Silva, J.G.
Format Conference Proceeding
LanguageEnglish
Published IEEE 1998
Subjects
Online AccessGet full text
ISBN9780818686467
0818686464
ISSN1089-6503
DOI10.1109/EURMIC.1998.711832

Cover

Loading…
Abstract Checkpointing and rollback recovery is a very effective technique to tolerate the occurrence of failures. Usually, the checkpoint data is saved in some diskfiles. However, in some situations the disk operation may result in a considerable performance overhead. Alternative solutions would make use of main memory to maintain the checkpoint data. The paper presents two main memory check pointing schemes that can be used in any parallel machine without requiring any change to the hardware: one scheme saves the checkpoints in the memory of other processors, while the other is based on a parity approach. Both techniques have been implemented and evaluated in a commercial parallel machine. Some conclusions have been taken that clearly show the superiority of one of those schemes.
AbstractList Checkpointing and rollback recovery is a very effective technique to tolerate the occurrence of failures. Usually, the checkpoint data is saved in some diskfiles. However, in some situations the disk operation may result in a considerable performance overhead. Alternative solutions would make use of main memory to maintain the checkpoint data. The paper presents two main memory check pointing schemes that can be used in any parallel machine without requiring any change to the hardware: one scheme saves the checkpoints in the memory of other processors, while the other is based on a parity approach. Both techniques have been implemented and evaluated in a commercial parallel machine. Some conclusions have been taken that clearly show the superiority of one of those schemes.
Author Silva, J.G.
Silva, L.M.
Author_xml – sequence: 1
  givenname: L.M.
  surname: Silva
  fullname: Silva, L.M.
  organization: Dept. de Engenharia Inf., Coimbra Univ., Portugal
– sequence: 2
  givenname: J.G.
  surname: Silva
  fullname: Silva, J.G.
BookMark eNotj9FKwzAUhgNOcM6-wK76Aq3npEmTXI4ydTARxF2PNDnVuJqWpQP39g7m1X_z8fH992wWh0iMLRFKRDCP693766Yp0RhdKkRd8RuWGaVBo651LWo1Y3MEbYpaQnXHspS-AQChkpUQc8ZXMaffkY7hh-Jk-zxNJ3_ObTucptyHdOgppdx9kTuMQ4hTiJ8P7LazfaLsfxds97T-aF6K7dvzpllti4BKTAUB95JzB8aRV8YpMq0VVEtNdWugNUL6Tguvue0uFFjZeoPUKbROW3TVgi2v3kBE-_FSaI_n_fVk9Qdk3Uhw
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/EURMIC.1998.711832
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EndPage 402 vol.1
ExternalDocumentID 711832
GroupedDBID 23M
29G
6IE
6IF
6IH
6IK
6IL
AAJGR
AAWTH
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IPLJI
M43
OCL
RIE
RIL
RNS
ID FETCH-LOGICAL-i174t-e02d522c09ced79c7e9ba4e658e6b90b945df84d82af2c00a5bd91ef71ac8a1c3
IEDL.DBID RIE
ISBN 9780818686467
0818686464
ISSN 1089-6503
IngestDate Tue Aug 26 17:35:56 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i174t-e02d522c09ced79c7e9ba4e658e6b90b945df84d82af2c00a5bd91ef71ac8a1c3
ParticipantIDs ieee_primary_711832
PublicationCentury 1900
PublicationDate 19980000
PublicationDateYYYYMMDD 1998-01-01
PublicationDate_xml – year: 1998
  text: 19980000
PublicationDecade 1990
PublicationTitle Proceedings. 24th EUROMICRO Conference (Cat. No.98EX204)
PublicationTitleAbbrev EURMIC
PublicationYear 1998
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0001035344
ssj0020048
Score 1.4968535
Snippet Checkpointing and rollback recovery is a very effective technique to tolerate the occurrence of failures. Usually, the checkpoint data is saved in some...
SourceID ieee
SourceType Publisher
StartPage 395
SubjectTerms Checkpointing
Computer crashes
Fault tolerance
Hardware
Maintenance
Parallel machines
Random access memory
Read-write memory
Workstations
Writing
Title An experimental study about diskless checkpointing
URI https://ieeexplore.ieee.org/document/711832
Volume 1
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFA9uJ71M58RvcvDaLm3TJjmKOIYHEXGw28jHC4xJN1x38a83H52b4sFbWgol4aXvvfT3gdCdyyiWGksSnhFIaGXAjYxNMm1zJpUqW7TFczWe0KdpOW11tgMXBgAC-AxSPwz_8s1Sb_xR2ZBlPgA7qOP6tkjV2h2nkKIMjhFtr-UjM2LrPbqDFEH60UvDV7SirfDO9pptyTREDB8nr276nsPH0_i6H7YrIeuMepHOvQ5ihR5sskg3jUr15y8px39O6BgNdvQ-_PKduE7QAdR91Nv6O-B2u_fR0Z5Y4SnK72u87weAgzItDshmbObrxbv7aGIXA3qxWs6DAcUATUaPbw_jpHVcSOauM2kSILlxBZkmQoNhQjMQSlJwVQpUShAlaGksp4bn0rqniCyVERlYlknNZaaLM9StlzWcI-xKE0sVy7kmylUBmnNBLPeKX4JXhcwuUN-vx2wVRTVmcSku_7x7hQ4jFdCffFyjbvOxgRtXCzTqNkTBF1skrH8
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NT8IwFG8UD-oFRYzf9uB1o9u6rT0aA0FFYgwk3MjaviYEM4hsF_96224IGg_eumXJspe3vde33wdCd6aiaKo08VhAwKOJArNS2gukDtNMiLhGWwyT_pg-TeJJrbPtuDAA4MBn4Nul-5evFrK0o7JOGtgE3EV7puxTXpG1NgMVEsXOM6LebdncrND1Ft9BIif-aMXhE5rQWnpnfZyu6TSEd7rjNxMAy-JjfnXDH8Yrru70mhWhe-XkCi3cZO6XhfDl5y8xx38-0hFqbwh--PW7dB2jHchbqLl2eMD1C99Ch1tyhScovM_xtiMAdtq02GGbsZqt5u_ms4lNFsj5cjFzFhRtNO51Rw99r_Zc8GZmb1J4QEJlWjJJuASVcpkCFxkF06dAIjgRnMZKM6pYmGlzFclioXgAOg0yybJARqeokS9yOEPYNCeaijRkkgjTB0jGONHMan5xlkRZcI5aNh7TZSWrMa1CcfHn2Vu03x-9DKaDx-HzJTqoiIF2DnKFGsVHCdemMyjEjcuIL6D7r88
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings.+24th+EUROMICRO+Conference+%28Cat.+No.98EX204%29&rft.atitle=An+experimental+study+about+diskless+checkpointing&rft.au=Silva%2C+L.M.&rft.au=Silva%2C+J.G.&rft.date=1998-01-01&rft.pub=IEEE&rft.isbn=9780818686467&rft.issn=1089-6503&rft.volume=1&rft.spage=395&rft.epage=402+vol.1&rft_id=info:doi/10.1109%2FEURMIC.1998.711832&rft.externalDocID=711832
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1089-6503&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1089-6503&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1089-6503&client=summon