A Sequential Non-Parametric Multivariate Two-Sample Test

Given samples from two distributions, a non-parametric two-sample test aims at determining whether the two distributions are equal or not, based on a test statistic. Classically, this statistic is computed on the whole data set, or is computed on a subset of the data set by a function trained on its...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on information theory Vol. 64; no. 5; pp. 3361 - 3370
Main Authors Lheritier, Alix, Cazals, Frederic
Format Journal Article
LanguageEnglish
Published New York IEEE 01.05.2018
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Institute of Electrical and Electronics Engineers
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Given samples from two distributions, a non-parametric two-sample test aims at determining whether the two distributions are equal or not, based on a test statistic. Classically, this statistic is computed on the whole data set, or is computed on a subset of the data set by a function trained on its complement. We consider methods in a third tier, so as to deal with large (possibly infinite) data sets, and to automatically determine the most relevant scales to work at, making two contributions. First, we develop a generic sequential non-parametric testing framework, in which the sample size need not be fixed in advance. This makes our test a truly sequential non-parametric multivariate two-sample test. Under information theoretic conditions qualifying the difference between the tested distributions, consistency of the two-sample test is established. Second, we instantiate our framework using nearest neighbor regressors, and show how the power of the resulting two-sample test can be improved using Bayesian mixtures and switch distributions. This combination of techniques yields automatic scale selection, and experiments performed on challenging data sets show that our sequential tests exhibit comparable performances to those of state-of-the-art non-sequential tests.
ISSN:0018-9448
1557-9654
DOI:10.1109/TIT.2018.2800658