Generalization in the Face of Adaptivity: A Bayesian Perspective
Advances in Neural Information Processing Systems, 36 (2024) Repeated use of a data sample via adaptively chosen queries can rapidly lead to overfitting, wherein the empirical evaluation of queries on the sample significantly deviates from their mean with respect to the underlying data distribution....
Saved in:
Main Authors | , |
---|---|
Format | Journal Article |
Language | English |
Published |
20.06.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Advances in Neural Information Processing Systems, 36 (2024) Repeated use of a data sample via adaptively chosen queries can rapidly lead
to overfitting, wherein the empirical evaluation of queries on the sample
significantly deviates from their mean with respect to the underlying data
distribution. It turns out that simple noise addition algorithms suffice to
prevent this issue, and differential privacy-based analysis of these algorithms
shows that they can handle an asymptotically optimal number of queries.
However, differential privacy's worst-case nature entails scaling such noise to
the range of the queries even for highly-concentrated queries, or introducing
more complex algorithms.
In this paper, we prove that straightforward noise-addition algorithms
already provide variance-dependent guarantees that also extend to unbounded
queries. This improvement stems from a novel characterization that illuminates
the core problem of adaptive data analysis. We show that the harm of adaptivity
results from the covariance between the new query and a Bayes factor-based
measure of how much information about the data sample was encoded in the
responses given to past queries. We then leverage this characterization to
introduce a new data-dependent stability notion that can bound this covariance. |
---|---|
DOI: | 10.48550/arxiv.2106.10761 |