Testing the equality of distributions using integrated maximum mean discrepancy
Comparing and testing for the homogeneity of two independent random samples is a fundamental statistical problem with many applications across various fields. However, existing methods may not be effective when the data is complex or high-dimensional. We propose a new method that integrates the maxi...
Saved in:
Published in | Journal of statistical planning and inference Vol. 236; p. 106246 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Elsevier B.V
01.05.2025
|
Subjects | |
Online Access | Get full text |
ISSN | 0378-3758 |
DOI | 10.1016/j.jspi.2024.106246 |
Cover
Loading…
Summary: | Comparing and testing for the homogeneity of two independent random samples is a fundamental statistical problem with many applications across various fields. However, existing methods may not be effective when the data is complex or high-dimensional. We propose a new method that integrates the maximum mean discrepancy (MMD) with a Gaussian kernel over all one-dimensional projections of the data. We derive the closed-form expression of the integrated MMD and prove its validity as a distributional similarity metric. We estimate the integrated MMD with the U-statistic theory and study its asymptotic behaviors under the null and two kinds of alternative hypotheses. We demonstrate that our method has the benefits of the MMD, and outperforms existing methods on both synthetic and real datasets, especially when the data is complex and high-dimensional.
•A new two-sample test using integrated MMD.•A simple yet valid distributional similarity metric known as Integrated MMD.•Asymptotic behaviors under both null and two alternative hypotheses.•Outperforming existing methods on complex and high-dimensional data. |
---|---|
ISSN: | 0378-3758 |
DOI: | 10.1016/j.jspi.2024.106246 |