Testing and Understanding Deviation Behaviors in FHE-Hardened Machine Learning Models

Fully homomorphic encryption (FHE) is a promising cryptographic primitive that enables secure computation over encrypted data. A primary use of FHE is to support privacypreserving machine learning (ML) on public cloud infrastructures. Despite the rapid development of FHE-based ML (or HE-ML), the com...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings / International Conference on Software Engineering pp. 2251 - 2263
Main Authors	Peng, Yiteng, Wu, Daoyuan, Liu, Zhibo, Xiao, Dongwei, Ji, Zhenlan, Rahmel, Juergen, Wang, Shuai
Format	Conference Proceeding
Language	English
Published	IEEE 26.04.2025
Subjects	Computational modeling Cryptography Machine learning Measurement Noise Predictive models Prevention and mitigation Software engineering Systematics Testing
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Fully homomorphic encryption (FHE) is a promising cryptographic primitive that enables secure computation over encrypted data. A primary use of FHE is to support privacypreserving machine learning (ML) on public cloud infrastructures. Despite the rapid development of FHE-based ML (or HE-ML), the community lacks a systematic understanding of their robustness. In this paper, we aim to systematically test and understand the deviation behaviors of HE-ML models, where the same input causes deviant outputs between FHE-hardened models and their plaintext versions, leading to completely incorrect model predictions. To effectively uncover deviation-triggering inputs under the constraints of expensive FHE computations, we design a novel differential testing tool called HEDIFF, which leverages the margin metric on the plaintext model as guidance to drive targeted testing on FHE models. For the identified deviation inputs, we further analyze them to determine whether they exhibit general noise patterns that are transferable. We evaluate HEDIFF using three popular HE-ML frameworks, covering 12 different combinations of models and datasets. HEDIFF successfully detected hundreds of deviation inputs across almost every tested FHE framework and model. We also quantitatively show that the identified deviation inputs are (visually) meaningful in comparison to regular inputs. Further schematic analysis reveals the root cause of these deviant inputs and allows us to generalize their noise patterns for more directed testing. Our work sheds light on enabling robust HE-ML for real-world usage.
ISSN:	1558-1225
DOI:	10.1109/ICSE55347.2025.00107