Secure Regression on Distributed Databases
This article presents several methods for performing linear regression on the union of distributed databases that preserve, to varying degrees, confidentiality of those databases. Such methods can be used by federal or state statistical agencies to share information from their individual databases,...
Saved in:
Published in | Journal of computational and graphical statistics Vol. 14; no. 2; pp. 263 - 279 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Taylor & Francis
01.06.2005
American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | This article presents several methods for performing linear regression on the union of distributed databases that preserve, to varying degrees, confidentiality of those databases. Such methods can be used by federal or state statistical agencies to share information from their individual databases, or to make such information available to others. Secure data integration, which provides the lowest level of protection, actually integrates the databases, but in a manner that no database owner can determine the origin of any records other than its own. Regression, associated diagnostics, or any other analysis then can be performed on the integrated data. Secure multiparty computation, based on shared local statistics effects computations necessary to compute least squares estimators of regression coefficients and error variances by means of analogous local computations that are combined additively using the secure summation protocol. We also provide two approaches to model diagnostics in this setting, one using shared residual statistics and the other using secure integration of synthetic residuals. |
---|---|
ISSN: | 1061-8600 1537-2715 |
DOI: | 10.1198/106186005X47714 |