Multi-GPU Design and Performance Evaluation of Homomorphic Encryption on GPU Clusters

We present a multi-GPU design, implementation and performance evaluation of the Halevi-Polyakov-Shoup (HPS) variant of the Fan-Vercauteren (FV) levelled Fully Homomorphic Encryption (FHE) scheme. Our design follows a data parallelism approach and uses partitioning methods to distribute the workload...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on parallel and distributed systems Vol. 32; no. 2; pp. 379 - 391
Main Authors Al Badawi, Ahmad, Veeravalli, Bharadwaj, Lin, Jie, Xiao, Nan, Kazuaki, Matsumura, Khin Mi Mi, Aung
Format Journal Article
LanguageEnglish
Published New York IEEE 01.02.2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:We present a multi-GPU design, implementation and performance evaluation of the Halevi-Polyakov-Shoup (HPS) variant of the Fan-Vercauteren (FV) levelled Fully Homomorphic Encryption (FHE) scheme. Our design follows a data parallelism approach and uses partitioning methods to distribute the workload in FV primitives evenly across available GPUs. The design is put to address space and runtime requirements of FHE computations. It is also suitable for distributed-memory architectures, and includes efficient GPU-to-GPU data exchange protocols. Moreover, it is user-friendly as user intervention is not required for task decomposition, scheduling or load balancing. We implement and evaluate the performance of our design on two homogeneous and heterogeneous <inline-formula><tex-math notation="LaTeX">{\sf NVIDIA}</tex-math> <mml:math><mml:mi mathvariant="sans-serif">NVIDIA</mml:mi></mml:math><inline-graphic xlink:href="qaisarahmadalbadawi-ieq1-3021238.gif"/> </inline-formula> GPU clusters: <inline-formula><tex-math notation="LaTeX">{\sf K80}</tex-math> <mml:math><mml:mrow><mml:mi mathvariant="sans-serif">K</mml:mi><mml:mn mathvariant="sans-serif">80</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="qaisarahmadalbadawi-ieq2-3021238.gif"/> </inline-formula>, and a customized <inline-formula><tex-math notation="LaTeX">{\sf P100}</tex-math> <mml:math><mml:mrow><mml:mi mathvariant="sans-serif">P</mml:mi><mml:mn mathvariant="sans-serif">100</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="qaisarahmadalbadawi-ieq3-3021238.gif"/> </inline-formula>. We also provide a comparison with a recent shared-memory-based multi-core CPU implementation using two homomorphic circuits as workloads: vector addition and multiplication. Moreover, we use our multi-GPU Levelled-FHE to implement the inference circuit of two Convolutional Neural Networks (CNNs) to perform homomorphically image classification on encrypted images from the <inline-formula><tex-math notation="LaTeX">{\sf MNIST}</tex-math> <mml:math><mml:mi mathvariant="sans-serif">MNIST</mml:mi></mml:math><inline-graphic xlink:href="qaisarahmadalbadawi-ieq4-3021238.gif"/> </inline-formula> and <inline-formula><tex-math notation="LaTeX">{\sf CIFAR-10}</tex-math> <mml:math><mml:mrow><mml:mi mathvariant="sans-serif">CIFAR</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="sans-serif">10</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="qaisarahmadalbadawi-ieq5-3021238.gif"/> </inline-formula> datasets. Our implementation provides 1 to 3 orders of magnitude speedup compared with the CPU implementation on vector operations. In terms of scalability, our design shows reasonable scalability curves when the GPUs are fully connected.
ISSN:1045-9219
1558-2183
DOI:10.1109/TPDS.2020.3021238