PAPRIKA: Private Online False Discovery Rate Control

In hypothesis testing, a false discovery occurs when a hypothesis is incorrectly rejected due to noise in the sample. When adaptively testing multiple hypotheses, the probability of a false discovery increases as more tests are performed. Thus the problem of False Discovery Rate (FDR) control is to...

Full description

Saved in:

Bibliographic Details
Main Authors	Zhang, Wanrong, Kamath, Gautam, Cummings, Rachel
Format	Journal Article
Language	English
Published	27.02.2020
Subjects	Computer Science - Cryptography and Security Computer Science - Data Structures and Algorithms Computer Science - Learning Mathematics - Statistics Theory Statistics - Machine Learning Statistics - Methodology Statistics - Theory
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In hypothesis testing, a false discovery occurs when a hypothesis is incorrectly rejected due to noise in the sample. When adaptively testing multiple hypotheses, the probability of a false discovery increases as more tests are performed. Thus the problem of False Discovery Rate (FDR) control is to find a procedure for testing multiple hypotheses that accounts for this effect in determining the set of hypotheses to reject. The goal is to minimize the number (or fraction) of false discoveries, while maintaining a high true positive rate (i.e., correct discoveries). In this work, we study False Discovery Rate (FDR) control in multiple hypothesis testing under the constraint of differential privacy for the sample. Unlike previous work in this direction, we focus on the online setting, meaning that a decision about each hypothesis must be made immediately after the test is performed, rather than waiting for the output of all tests as in the offline setting. We provide new private algorithms based on state-of-the-art results in non-private online FDR control. Our algorithms have strong provable guarantees for privacy and statistical performance as measured by FDR and power. We also provide experimental results to demonstrate the efficacy of our algorithms in a variety of data environments.
DOI:	10.48550/arxiv.2002.12321