Algorithmic Gaussianization through Sketching: Converting Data into Sub-gaussian Random Designs
Algorithmic Gaussianization is a phenomenon that can arise when using randomized sketching or sampling methods to produce smaller representations of large datasets: For certain tasks, these sketched representations have been observed to exhibit many robust performance characteristics that are known...
Saved in:
Main Author | |
---|---|
Format | Journal Article |
Language | English |
Published |
21.06.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Algorithmic Gaussianization is a phenomenon that can arise when using
randomized sketching or sampling methods to produce smaller representations of
large datasets: For certain tasks, these sketched representations have been
observed to exhibit many robust performance characteristics that are known to
occur when a data sample comes from a sub-gaussian random design, which is a
powerful statistical model of data distributions. However, this phenomenon has
only been studied for specific tasks and metrics, or by relying on
computationally expensive methods. We address this by providing an algorithmic
framework for gaussianizing data distributions via averaging, proving that it
is possible to efficiently construct data sketches that are nearly
indistinguishable (in terms of total variation distance) from sub-gaussian
random designs. In particular, relying on a recently introduced sketching
technique called Leverage Score Sparsified (LESS) embeddings, we show that one
can construct an $n\times d$ sketch of an $N\times d$ matrix $A$, where $n\ll
N$, that is nearly indistinguishable from a sub-gaussian design, in time
$O(\text{nnz}(A)\log N + nd^2)$, where $\text{nnz}(A)$ is the number of
non-zero entries in $A$. As a consequence, strong statistical guarantees and
precise asymptotics available for the estimators produced from sub-gaussian
designs (e.g., for least squares and Lasso regression, covariance estimation,
low-rank approximation, etc.) can be straightforwardly adapted to our sketching
framework. We illustrate this with a new approximation guarantee for sketched
least squares, among other examples. |
---|---|
DOI: | 10.48550/arxiv.2206.10291 |