A RISC-V Fault-Tolerant Soft-Processor Based on Full/Partial Heterogeneous Dual-Core Protection
The low probability of single event upsets (SEU) within particular satellite orbits, makes Commercial-off-the-shelf (COTS) electronic components a viable solution for space system implementation, thanks to the introduction of design-level fault tolerance techniques at the expense of some performance...
Saved in:
Published in | IEEE access Vol. 12; pp. 30495 - 30506 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
Piscataway
IEEE
2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The low probability of single event upsets (SEU) within particular satellite orbits, makes Commercial-off-the-shelf (COTS) electronic components a viable solution for space system implementation, thanks to the introduction of design-level fault tolerance techniques at the expense of some performance/energy/area penalty. This paper illustrates the design and validation of a novel RISC-V dual-core architecture, based on a computing paradigm that we refer to as full/partial heterogeneous multi-core protection. The approach relies on a small, low-performance, fully fault-tolerant core (LP core) coupled with a high-performance partially fault-tolerant core (HP core). The computing paradigm assumes the failure-exposed HP core executes computation intensive routines for relatively short periods of time, making the occurrence of failures a statistically unlikely situation, while the fully fault-tolerant LP core operates in critical control tasks and manages the failure recovery of the high-performance core. The execution time percentage in the LP core varies from a minimum of 11.4% up to a maximum of 91.3% while in the HP core it is between 8.7% and 88.6%, depending on the application. In the proposed study, both the cores belong to the RISC-V compliant Klessydra core family. The dual-core architecture also includes a watchdog timer controlled by the LP core and monitoring the non-protected HP core, and a context switch FIFO that speeds up the code and data switch between the two cores during failure recovery. A dedicated run-time software environment coordinates the execution of tasks on the high-performance core in a resilient fashion. The dual-core processor has been validated through extensive RTL simulations running in an UVM-based fault-injection environment, which emulates SEUs at various rates. Experimental results illustrate the benefits and limits obtained by using a heterogeneous architecture with different levels of protection and performance. The failure probability assuming a SEU fault occurrence can be reduced by a factor between 10X and 30X with respect to the non-protected architecture, leading to an average failure rate of up to 4.00E-06 per second with respect to 1.80E-05 per second in the non-protected architecture. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2024.3366806 |