AutoML Two-Sample Test

Two-sample tests are important in statistics and machine learning, both as tools for scientific discovery as well as to detect distribution shifts. This led to the development of many sophisticated test procedures going beyond the standard supervised learning frameworks, whose usage can require spec...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Kübler, Jonas M, Stimper, Vincent, Buchholz, Simon, Muandet, Krikamol, Schölkopf, Bernhard
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 15.01.2023
Subjects	Machine learning Test procedures
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Two-sample tests are important in statistics and machine learning, both as tools for scientific discovery as well as to detect distribution shifts. This led to the development of many sophisticated test procedures going beyond the standard supervised learning frameworks, whose usage can require specialized knowledge about two-sample testing. We use a simple test that takes the mean discrepancy of a witness function as the test statistic and prove that minimizing a squared loss leads to a witness with optimal testing power. This allows us to leverage recent advancements in AutoML. Without any user input about the problems at hand, and using the same method for all our experiments, our AutoML two-sample test achieves competitive performance on a diverse distribution shift benchmark as well as on challenging two-sample testing problems. We provide an implementation of the AutoML two-sample test in the Python package autotst.
ISSN:	2331-8422