Towards a Unified Information-Theoretic Framework for Generalization
In this work, we investigate the expressiveness of the "conditional mutual information" (CMI) framework of Steinke and Zakynthinou (2020) and the prospect of using it to provide a unified framework for proving generalization bounds in the realizable setting. We first demonstrate that one c...
Saved in:
Main Authors | , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
09.11.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In this work, we investigate the expressiveness of the "conditional mutual
information" (CMI) framework of Steinke and Zakynthinou (2020) and the prospect
of using it to provide a unified framework for proving generalization bounds in
the realizable setting. We first demonstrate that one can use this framework to
express non-trivial (but sub-optimal) bounds for any learning algorithm that
outputs hypotheses from a class of bounded VC dimension. We prove that the CMI
framework yields the optimal bound on the expected risk of Support Vector
Machines (SVMs) for learning halfspaces. This result is an application of our
general result showing that stable compression schemes Bousquet al. (2020) of
size $k$ have uniformly bounded CMI of order $O(k)$. We further show that an
inherent limitation of proper learning of VC classes contradicts the existence
of a proper learner with constant CMI, and it implies a negative resolution to
an open problem of Steinke and Zakynthinou (2020). We further study the CMI of
empirical risk minimizers (ERMs) of class $H$ and show that it is possible to
output all consistent classifiers (version space) with bounded CMI if and only
if $H$ has a bounded star number (Hanneke and Yang (2015)). Moreover, we prove
a general reduction showing that "leave-one-out" analysis is expressible via
the CMI framework. As a corollary we investigate the CMI of the
one-inclusion-graph algorithm proposed by Haussler et al. (1994). More
generally, we show that the CMI framework is universal in the sense that for
every consistent algorithm and data distribution, the expected risk vanishes as
the number of samples diverges if and only if its evaluated CMI has sublinear
growth with the number of samples. |
---|---|
DOI: | 10.48550/arxiv.2111.05275 |