A Coincidence-Based Test for Uniformity Given Very Sparsely Sampled Discrete Data

How many independent samples N do we need from a distribution p to decide that p is epsiv-distant from uniform in an L 1 sense, Sigma i=1 m | p ( i ) - 1/ m | > epsiv? (Here m is the number of bins on which the distribution is supported, and is assumed known a priori .) Somewhat surprisingly, we...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on information theory Vol. 54; no. 10; pp. 4750 - 4755
Main Author Paninski, L.
Format Journal Article
LanguageEnglish
Published New York, NY IEEE 01.10.2008
Institute of Electrical and Electronics Engineers
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:How many independent samples N do we need from a distribution p to decide that p is epsiv-distant from uniform in an L 1 sense, Sigma i=1 m | p ( i ) - 1/ m | > epsiv? (Here m is the number of bins on which the distribution is supported, and is assumed known a priori .) Somewhat surprisingly, we only need N epsiv 2 Gt m 1/2 to make this decision reliably (this condition is both sufficient and necessary). The test for uniformity introduced here is based on the number of observed ldquocoincidencesrdquo (samples that fall into the same bin), the mean and variance of which may be computed explicitly for the uniform distribution and bounded nonparametrically for any distribution that is known to be epsiv-distant from uniform. Some connections to the classical birthday problem are noted.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ISSN:0018-9448
1557-9654
DOI:10.1109/TIT.2008.928987