HE-Booster: An Efficient Polynomial Arithmetic Acceleration on GPUs for Fully Homomorphic Encryption

Fully Homomorphic Encryption (FHE) enables secure offloading of computations to untrusted cloud servers as it allows computing on encrypted data. However, existing well-known FHE schemes suffer from heavy performance overheads. Thus numerous accelerations based on FPGAs, ASICs, and GPUs have been pr...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on parallel and distributed systems Vol. 34; no. 4; pp. 1 - 17
Main Authors Wang, Zhiwei, Li, Peinan, Hou, Rui, Li, Zhihao, Cao, Jiangfeng, Wang, XiaoFeng, Meng, Dan
Format Journal Article
LanguageEnglish
Published New York IEEE 01.04.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Fully Homomorphic Encryption (FHE) enables secure offloading of computations to untrusted cloud servers as it allows computing on encrypted data. However, existing well-known FHE schemes suffer from heavy performance overheads. Thus numerous accelerations based on FPGAs, ASICs, and GPUs have been proposed. Compared to FPGAs and ASICs, GPUs have obvious advantages in productivity and development costs. And also, GPUs have already been widely deployed in commercial cloud or supercomputing centers. Therefore, we present HE-Booster, an efficient GPU-based FHE acceleration design. For single-GPU acceleration, a thorough systematic design is exploited to map five common phases in typical FHE schemes to the GPU parallel architecture. In particular, inspired by the regular architecture of NTT/INTT, a novel inter-thread local synchronization is proposed to exploit thread-level parallelism. For multi-GPU acceleration, we propose a scalable parallelization design that exploits data-level parallelism through fine-grained data partition under different representations. Finally, experiments on 1 NVIDIA GPU demonstrate that our work outperforms 251.7×, 78.5× and 164.9× than three mainstream CPU-based libraries HElib, SEAL, and PALISADE, and up to 170.5× speedup is obtained compared to the GPU-accelerated library cuHE. What's more, performing 8 homomorphic multiplications on 8 GPUs can deliver up to a 7.66× performance boost compared to a single-GPU implementation.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1045-9219
1558-2183
DOI:10.1109/TPDS.2022.3228628