A faster subquadratic algorithm for finding outlier correlations
We study the problem of detecting outlier pairs of strongly correlated variables among a collection of $n$ variables with otherwise weak pairwise correlations. After normalization, this task amounts to the geometric task where we are given as input a set of $n$ vectors with unit Euclidean norm and d...
Saved in:
Main Authors | , , |
---|---|
Format | Journal Article |
Language | English |
Published |
13.10.2015
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | We study the problem of detecting outlier pairs of strongly correlated
variables among a collection of $n$ variables with otherwise weak pairwise
correlations. After normalization, this task amounts to the geometric task
where we are given as input a set of $n$ vectors with unit Euclidean norm and
dimension $d$, and for some constants $0<\tau<\rho<1$, we are asked to find all
the outlier pairs of vectors whose inner product is at least $\rho$ in absolute
value, subject to the promise that all but at most $q$ pairs of vectors have
inner product at most $\tau$ in absolute value.
Improving on an algorithm of G. Valiant [FOCS 2012; J. ACM 2015], we present
a randomized algorithm that for Boolean inputs ($\{-1,1\}$-valued data
normalized to unit Euclidean length) runs in time \[ \tilde
O\bigl(n^{\max\,\{1-\gamma+M(\Delta\gamma,\gamma),\,M(1-\gamma,2\Delta\gamma)\}}+qdn^{2\gamma}\bigr)\,,
\] where $0<\gamma<1$ is a constant tradeoff parameter and $M(\mu,\nu)$ is the
exponent to multiply an $\lfloor n^\mu\rfloor\times\lfloor n^\nu\rfloor$ matrix
with an $\lfloor n^\nu\rfloor\times \lfloor n^\mu\rfloor$ matrix and
$\Delta=1/(1-\log_\tau\rho)$. As corollaries we obtain randomized algorithms
that run in time \[ \tilde
O\bigl(n^{\frac{2\omega}{3-\log_\tau\rho}}+qdn^{\frac{2(1-\log_\tau\rho)}{3-\log_\tau\rho}}\bigr)
\] and in time \[ \tilde
O\bigl(n^{\frac{4}{2+\alpha(1-\log_\tau\rho)}}+qdn^{\frac{2\alpha(1-\log_\tau\rho)}{2+\alpha(1-\log_\tau\rho)}}\bigr)\,,
\] where $2\leq\omega<2.38$ is the exponent for square matrix multiplication
and $0.3<\alpha\leq 1$ is the exponent for rectangular matrix multiplication.
The notation $\tilde O(\cdot)$ hides polylogarithmic factors in $n$ and $d$
whose degree may depend on $\rho$ and $\tau$. We present further corollaries
for the light bulb problem and for learning sparse Boolean functions. |
---|---|
DOI: | 10.48550/arxiv.1510.03895 |