Secure Regression on Distributed Databases

This article presents several methods for performing linear regression on the union of distributed databases that preserve, to varying degrees, confidentiality of those databases. Such methods can be used by federal or state statistical agencies to share information from their individual databases,...

Full description

Saved in:
Bibliographic Details
Published inJournal of computational and graphical statistics Vol. 14; no. 2; pp. 263 - 279
Main Authors Karr, Alan F, Lin, Xiaodong, Sanil, Ashish P, Reiter, Jerome P
Format Journal Article
LanguageEnglish
Published Taylor & Francis 01.06.2005
American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This article presents several methods for performing linear regression on the union of distributed databases that preserve, to varying degrees, confidentiality of those databases. Such methods can be used by federal or state statistical agencies to share information from their individual databases, or to make such information available to others. Secure data integration, which provides the lowest level of protection, actually integrates the databases, but in a manner that no database owner can determine the origin of any records other than its own. Regression, associated diagnostics, or any other analysis then can be performed on the integrated data. Secure multiparty computation, based on shared local statistics effects computations necessary to compute least squares estimators of regression coefficients and error variances by means of analogous local computations that are combined additively using the secure summation protocol. We also provide two approaches to model diagnostics in this setting, one using shared residual statistics and the other using secure integration of synthetic residuals.
ISSN:1061-8600
1537-2715
DOI:10.1198/106186005X47714