Error Behavior Comparison of Multiple Computing Systems: A Case Study Using Linux on Pentium, Solaris on SPARC, and AIX on POWER

This paper presents an approach to conducting experimental studies for the characterization and comparison of the error behavior in different computing systems. The proposed approach is applied to characterize and compare the error behavior of three commercial systems (Linux 2.6 on Pentium 4, Solari...

Full description

Saved in:
Bibliographic Details
Published in2008 14th IEEE Pacific Rim International Symposium on Dependable Computing pp. 339 - 346
Main Authors Chen, D., Jacques-Silva, G., Kalbarczyk, Z., Iyer, R.K., Mealey, B.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.12.2008
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This paper presents an approach to conducting experimental studies for the characterization and comparison of the error behavior in different computing systems. The proposed approach is applied to characterize and compare the error behavior of three commercial systems (Linux 2.6 on Pentium 4, Solaris 10 on UltraSPARC IIIi, and AIX 5.3 on POWER 5) under hardware transient faults. The data is obtained by conducting extensive fault injection into kernel code, kernel stack, and system registers with the NFTAPE framework while running the Apache Web server as a workload. The error behavior comparison shows that the Linux system has the highest average crash latency, the Solaris system has the highest hang rate, and the AIX system has the lowest error sensitivity and the least amount of crashes in the more severe categories.
DOI:10.1109/PRDC.2008.35