CED-HDC: Lightweight Concurrent Error Detection for Reliable Hyperdimensional Computing

HyperDimensional Computing (HDC) is a machine learning paradigm that is well suited for edge devices due to its low-overhead inference hardware and inherent robustness to bit-flips and noise. For safety-critical applications, reliability is paramount, with runtime failures posing a serious threat to...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings - IEEE VLSI Test Symposium pp. 1 - 7
Main Authors	Roodsari, Mahboobe Sadeghipour, Meyers, Vincent, Tahoori, Mehdi
Format	Conference Proceeding
Language	English
Published	IEEE 28.04.2025
Subjects	Accuracy concurrent error detection Fault tolerance functional safety Hardware hyperdimensional computing Noise Performance evaluation reliability Reliability engineering Robustness Runtime Safety Very large scale integration
Online Access	Get full text

Cover

Loading…

More Information
Summary:	HyperDimensional Computing (HDC) is a machine learning paradigm that is well suited for edge devices due to its low-overhead inference hardware and inherent robustness to bit-flips and noise. For safety-critical applications, reliability is paramount, with runtime failures posing a serious threat to HDC accelerators. While HDC is robust to several bit flops in memory without significant loss of accuracy, its performance degrades rapidly once a critical threshold is exceeded where hardware faults exceed the tolerance capacity of the algorithm. Ensuring reliable operation over the lifetime of the system remains a challenge, particularly with runtime hardware failures. Conventional concurrent error detection (CED) methods often only address a limited number of faults or incur significant hardware overhead, which either fall under the algorithmic robustness of HDC or contradict the lightweight nature of HDC implementations. In this work, we propose a lightweight CED method that is tailored to HDC systems. Our method can dynamically detect faults before they cause noticeable accuracy degradation. It introduces negligible hardware overhead (< 0.1%), no additional latency, and ensures 100% coverage of critical errors.
ISSN:	2375-1053
DOI:	10.1109/VTS65138.2025.11022900