Communication-Efficient Secure Logistic Regression

We present a novel construction that enables two parties to securely train a logistic regression model on private secret-shared data. Our goal is to minimize online communication and round complexity, while still allowing for an efficient offline phase. As part of our construction, we develop many b...

Full description

Saved in:
Bibliographic Details
Published in2024 IEEE 9th European Symposium on Security and Privacy (EuroS&P) pp. 440 - 467
Main Authors Agarwal, Amit, Peceny, Stanislav, Raykova, Mariana, Schoppmann, Phillipp, Seth, Karn
Format Conference Proceeding
LanguageEnglish
Published IEEE 08.07.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:We present a novel construction that enables two parties to securely train a logistic regression model on private secret-shared data. Our goal is to minimize online communication and round complexity, while still allowing for an efficient offline phase. As part of our construction, we develop many building blocks of independent interest. These include a new ap-proximation technique for the sigmoid function that results in a secure protocol with better communication, protocols for secure powers evaluation and secure spline computation on fixed-point values, and a new comparison protocol that optimizes online communication. We also present a new two-party protocol for generating keys for distributed point functions (DPFs) over arithmetic sharing, where previous constructions do this only for Boolean outputs. We implement our protocol in an end-to-end system and benchmark its efficiency. We can securely evaluate a batch of 10 3 sigmoids with \approx 0.5 MB of online communication, 4 online rounds, and \approx 1.6 seconds of online time over WAN. This is \approx 30\times less in online communication, \approx 31\times fewer online rounds, and \approx 5.5\times less online time than the well-known MP-SPDZ's protocol. Our system can train a logistic regression model over 6 epochs and a database containing 70, 000 samples and 15 features with 208.09 MB of online communication and 9.68 minutes of online time. We compare our logistic regression training against MP-SPDZ over a synthetic dataset of 1000 samples and 10 features and show an improvement of \approx 130\times in online communication and ≈ 4.75× in online time over WAN. We converge to virtually the same model as plaintext in all cases. We open-source our system and include extensive tests.
ISSN:2995-1356
DOI:10.1109/EuroSP60621.2024.00031