RediI: Test Infrastructure to Enable Deterministic Reproduction of Failures for Distributed Systems
Despite the fact that distributed systems have become a crucial aspect of modern technology and support many of the software systems that enable modern life, developers experience challenges in performing regression testing of these systems. Existing solutions for testing distributed systems are oft...
Saved in:
Published in | Proceedings / International Conference on Software Engineering pp. 191 - 203 |
---|---|
Main Authors | , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
26.04.2025
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Despite the fact that distributed systems have become a crucial aspect of modern technology and support many of the software systems that enable modern life, developers experience challenges in performing regression testing of these systems. Existing solutions for testing distributed systems are often either: (1) specialized testing environments that are created specifically for each system by its development team, which requires substantial effort for each team, with little-to-no sharing of this effort across teams; or (2) randomized injection tools that are often computationally expensive and offer no guarantees of preventing regressions, due to their randomness. The challenge of providing a generalized and practical solution to trigger bugs for reproducing and demonstrating failures, as well as to guard against regressions, is largely unaddressed. In this work, we present RediI, an infrastructure for supporting regression testing of distributed systems. RediI contains a dataset of real bugs on common distributed systems, along with a generalizable testing framework RediT that enables developers to write tests that can reproduce failures by providing ways to deterministically control distributed execution. In addition to the real failures in RediIfrom multiple distributed systems, RediT provides a reusable, programmable, platform-agnostic, deterministic testing framework for developers of distributed systems. It can help automate the running of such tests, for both practitioners and researchers. We demonstrate RediT with 63 bugs that we selected in Jira on 7 large and widely used distributed systems. Our case studies show that RediI can be used to allow developers to write tests that effectively reproduce failures on distributed systems and generate specific scenarios for regression testing, as well as providing deterministic failure injection that can help developers and researchers to better understand deterministic failures that may occur in distributed systems in the future. Additionally, our studies show that RediI is efficient for real-world system regression testing, providing a powerful tool for developers and researchers in the field of distributed-system testing. |
---|---|
AbstractList | Despite the fact that distributed systems have become a crucial aspect of modern technology and support many of the software systems that enable modern life, developers experience challenges in performing regression testing of these systems. Existing solutions for testing distributed systems are often either: (1) specialized testing environments that are created specifically for each system by its development team, which requires substantial effort for each team, with little-to-no sharing of this effort across teams; or (2) randomized injection tools that are often computationally expensive and offer no guarantees of preventing regressions, due to their randomness. The challenge of providing a generalized and practical solution to trigger bugs for reproducing and demonstrating failures, as well as to guard against regressions, is largely unaddressed. In this work, we present RediI, an infrastructure for supporting regression testing of distributed systems. RediI contains a dataset of real bugs on common distributed systems, along with a generalizable testing framework RediT that enables developers to write tests that can reproduce failures by providing ways to deterministically control distributed execution. In addition to the real failures in RediIfrom multiple distributed systems, RediT provides a reusable, programmable, platform-agnostic, deterministic testing framework for developers of distributed systems. It can help automate the running of such tests, for both practitioners and researchers. We demonstrate RediT with 63 bugs that we selected in Jira on 7 large and widely used distributed systems. Our case studies show that RediI can be used to allow developers to write tests that effectively reproduce failures on distributed systems and generate specific scenarios for regression testing, as well as providing deterministic failure injection that can help developers and researchers to better understand deterministic failures that may occur in distributed systems in the future. Additionally, our studies show that RediI is efficient for real-world system regression testing, providing a powerful tool for developers and researchers in the field of distributed-system testing. |
Author | Zhou, Mengbo Lin, Zheyuan Feng, Yang Liu, Jia Zhao, Dongchen Jones, James A. |
Author_xml | – sequence: 1 givenname: Yang surname: Feng fullname: Feng, Yang email: fengyang@nju.edu.cn organization: Nanjing University,State Key Laboratory for Novel Software Technology – sequence: 2 givenname: Zheyuan surname: Lin fullname: Lin, Zheyuan email: zheyuanlin@smail.nju.edu.cn organization: Nanjing University,State Key Laboratory for Novel Software Technology – sequence: 3 givenname: Dongchen surname: Zhao fullname: Zhao, Dongchen email: dongchenzhao@smail.nju.edu.cn organization: Nanjing University,State Key Laboratory for Novel Software Technology – sequence: 4 givenname: Mengbo surname: Zhou fullname: Zhou, Mengbo email: mengbozhou@smail.nju.edu.cn organization: Nanjing University,State Key Laboratory for Novel Software Technology – sequence: 5 givenname: Jia surname: Liu fullname: Liu, Jia email: liujia@nju.edu.cn organization: Nanjing University,State Key Laboratory for Novel Software Technology – sequence: 6 givenname: James A. surname: Jones fullname: Jones, James A. email: jajones@uci.edu organization: University of California,Irvine,California,USA |
BookMark | eNotkMtqAjEYRtPSQtX6Bi7yAmP_3CaT7oqXdkAoqHtJJn8gRWckiQvfvgPt6iy-w1l8U_LUDz0SsmCwZAzMW7s6bJQSUi85cLUE4FI-kLnRphGCKVC1YY9kwpRqKsa5eiHTnH8AoJbGTEi3Rx_bd3rEXGjbh2RzSbeu3BLSMtBNb90Z6RoLpkvsYy6xo3u8psGPUhx6OgS6tfE8-pmGIdH16KTobgU9PdxzwUt-Jc_BnjPO_zkjx-3muPqqdt-f7epjV0UjSsWdFBhAoDXQOc5MIwGC1c6P9M5qq6U0nXTC117UHEOjhXA47p4F7cWMLP6yERFP1xQvNt1P40ncmLoRv8TfWdM |
CODEN | IEEPAD |
ContentType | Conference Proceeding |
DBID | 6IE 6IH CBEJK RIE RIO |
DOI | 10.1109/ICSE55347.2025.00244 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISBN | 9798331505691 |
EISSN | 1558-1225 |
EndPage | 203 |
ExternalDocumentID | 11029968 |
Genre | orig-research |
GrantInformation_xml | – fundername: National Natural Science Foundation of China grantid: 62372225,62272220 funderid: 10.13039/501100001809 |
GroupedDBID | -~X .4S .DC 29O 5VS 6IE 6IF 6IH 6IK 6IL 6IM 6IN 8US AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS ARCSS AVWKF BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO EDO FEDTE I-F IEGSK IJVOP IPLJI M43 OCL RIE RIL RIO |
ID | FETCH-LOGICAL-i93t-2b43ef03ea90cb2198400fa7bd400dba7a7449c4b3d6d362ef8733bebd4d1f7d3 |
IEDL.DBID | RIE |
IngestDate | Wed Aug 27 01:40:27 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | true |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i93t-2b43ef03ea90cb2198400fa7bd400dba7a7449c4b3d6d362ef8733bebd4d1f7d3 |
PageCount | 13 |
ParticipantIDs | ieee_primary_11029968 |
PublicationCentury | 2000 |
PublicationDate | 2025-April-26 |
PublicationDateYYYYMMDD | 2025-04-26 |
PublicationDate_xml | – month: 04 year: 2025 text: 2025-April-26 day: 26 |
PublicationDecade | 2020 |
PublicationTitle | Proceedings / International Conference on Software Engineering |
PublicationTitleAbbrev | ICSE |
PublicationYear | 2025 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0006499 |
Score | 2.2896092 |
Snippet | Despite the fact that distributed systems have become a crucial aspect of modern technology and support many of the software systems that enable modern life,... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 191 |
SubjectTerms | Computer bugs Distributed Systems Infrastructure Performance analysis Regression Testing Runtime Software systems Testing |
Title | RediI: Test Infrastructure to Enable Deterministic Reproduction of Failures for Distributed Systems |
URI | https://ieeexplore.ieee.org/document/11029968 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEF5sT57qo-KbPXhNm2Q3u1mvfdAKFtEKvZV9QhESqcnFX-_sJq0oCF6SkAQSZtj5ZnZmvkHoDjx-nkgXQ2xC4MAogyXlROSUh28XSx0y-I8LNnulD6ts1Tarh14Ya20oPrMDfxly-abUtd8qGwJUgfVkeQd1IHJrmrX2ZpeB7972xiWxGM5HL5MsI5RDDJiGfRNKf0xQCQAy7aHF7tNN3cjboK7UQH_-YmX8978dof53rx5-2qPQMTqwxQnq7YY14HbtniL9DC_M7_ESYADPC7eVDXVsvbW4KvEk9FDhcVsdE-ibMXjnDSEsKA-XDk_lxlexf2DwdPHYU-76aVnW4Jb3vI-W08lyNIvaCQvRRpAqShUl1sXEShFrBbYLor3YSa4MnI2SXHJKhaaKGGYA6azLOSHKwnOTOG7IGeoWZWHPEZYii02WG6lyz6jnHTvLudMiSw0Y3OQC9b3M1u8Nh8Z6J67LP-5foUOvN5-3Sdk16oJE7A3Af6Vug9q_AJdrsKA |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8MwDI5gHOA0HkO8yYFrt7ZJmpbrHlphmxAUabcpaRJpQmrR6C78epy0GwIJiUtbtZUaOY0_O7Y_I3QHFj8PhPHBNyFwiGgES8oknpEWvo0vchfBn86i8St9mLN5U6zuamG01i75THftpYvlqzJf262yHkAVaM8o3kV7APwsqMu1too3Auu9qY4L_KSX9l-GjBHKwQsM3c4JpT96qDgIGbXRbPPxOnPkrbuuZDf__MXL-O_RHaLOd7Ueftri0BHa0cUxam_aNeBm9Z6g_BleSO9xBkCA08KsRE0eu15pXJV46Kqo8KDJj3EEzhjs85oSFqYPlwaPxNLmsX9gsHXxwJLu2n5ZWuGG-byDstEw64-9pseCt0xI5YWSEm18okXi5xK0F_h7vhFcKjgrKbjglCY5lURFCrBOm5gTIjU8V4HhipyiVlEW-gxhkTBfsVgJGVtOPWvaac5NnrBQgcoNzlHHymzxXrNoLDbiuvjj_i3aH2fTyWKSzh4v0YGdQxvFCaMr1ALp6GswBip5436BL_n7s-k |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+International+Conference+on+Software+Engineering&rft.atitle=RediI%3A+Test+Infrastructure+to+Enable+Deterministic+Reproduction+of+Failures+for+Distributed+Systems&rft.au=Feng%2C+Yang&rft.au=Lin%2C+Zheyuan&rft.au=Zhao%2C+Dongchen&rft.au=Zhou%2C+Mengbo&rft.date=2025-04-26&rft.pub=IEEE&rft.eissn=1558-1225&rft.spage=191&rft.epage=203&rft_id=info:doi/10.1109%2FICSE55347.2025.00244&rft.externalDocID=11029968 |