Reliability and Performance Analysis of Architecture-Based Software Implementing Restarts and Retries Subject to Correlated Component Failures

High reliability and performance are essential attributes of software systems designed for critical real-time applications. To improve the reliability and performance of software, many systems incorporate some form of fault recovery mechanism. However, contemporary models of software reliability and...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of software engineering and knowledge engineering Vol. 25; no. 8; pp. 1307 - 1334
Main Authors Li, Xiao-Dan, Yin, Yong-Feng, Fiondella, Lance
Format Journal Article
LanguageEnglish
Published Singapore World Scientific Publishing Company 01.10.2015
World Scientific Publishing Co. Pte., Ltd
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:High reliability and performance are essential attributes of software systems designed for critical real-time applications. To improve the reliability and performance of software, many systems incorporate some form of fault recovery mechanism. However, contemporary models of software reliability and performance rarely consider these fault recovery mechanisms. Another notable shortcoming of many software models is that they make the simplifying assumption that component failures are statistically independent, which disagrees with several experimental studies that have shown that the failures of software components can exhibit correlation. This paper presents an architecture-based model of software reliability and performance that explicitly considers a two-stage fault recovery mechanism implementing component restarts and application-level retries. The application architecture is characterized by a Discrete Time Markov Chain (DTMC) to represent the dynamic branching behavior of control between the components of the application. Correlations between the component failures are computed with an efficient numerical algorithm for a multivariate Bernoulli (MVB) distribution. We illustrate the utility of the model through a case study of an embedded software application. The results suggest that the model can be used to quantify the impact of software fault recovery and correlated component failures on application reliability and performance.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:0218-1940
1793-6403
DOI:10.1142/S0218194015500266