A Sequential Non-Parametric Multivariate Two-Sample Test
Given samples from two distributions, a non-parametric two-sample test aims at determining whether the two distributions are equal or not, based on a test statistic. Classically, this statistic is computed on the whole data set, or is computed on a subset of the data set by a function trained on its...
Saved in:
Published in | IEEE transactions on information theory Vol. 64; no. 5; pp. 3361 - 3370 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.05.2018
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Institute of Electrical and Electronics Engineers |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Given samples from two distributions, a non-parametric two-sample test aims at determining whether the two distributions are equal or not, based on a test statistic. Classically, this statistic is computed on the whole data set, or is computed on a subset of the data set by a function trained on its complement. We consider methods in a third tier, so as to deal with large (possibly infinite) data sets, and to automatically determine the most relevant scales to work at, making two contributions. First, we develop a generic sequential non-parametric testing framework, in which the sample size need not be fixed in advance. This makes our test a truly sequential non-parametric multivariate two-sample test. Under information theoretic conditions qualifying the difference between the tested distributions, consistency of the two-sample test is established. Second, we instantiate our framework using nearest neighbor regressors, and show how the power of the resulting two-sample test can be improved using Bayesian mixtures and switch distributions. This combination of techniques yields automatic scale selection, and experiments performed on challenging data sets show that our sequential tests exhibit comparable performances to those of state-of-the-art non-sequential tests. |
---|---|
ISSN: | 0018-9448 1557-9654 |
DOI: | 10.1109/TIT.2018.2800658 |