On Online Control of False Discovery Rate (1502.06197v2)

Published 22 Feb 2015 in stat.ME, cs.LG, math.ST, stat.AP, and stat.TH

Abstract: Multiple hypotheses testing is a core problem in statistical inference and arises in almost every scientific field. Given a sequence of null hypotheses $\mathcal{H}(n) = (H_1,..., H_n)$, Benjamini and Hochberg \cite{benjamini1995controlling} introduced the false discovery rate (FDR) criterion, which is the expected proportion of false positives among rejected null hypotheses, and proposed a testing procedure that controls FDR below a pre-assigned significance level. They also proposed a different criterion, called mFDR, which does not control a property of the realized set of tests; rather it controls the ratio of expected number of false discoveries to the expected number of discoveries. In this paper, we propose two procedures for multiple hypotheses testing that we will call "LOND" and "LORD". These procedures control FDR and mFDR in an \emph{online manner}. Concretely, we consider an ordered --possibly infinite-- sequence of null hypotheses $\mathcal{H} = (H_1,H_2,H_3,...)$ where, at each step $i$, the statistician must decide whether to reject hypothesis $H_i$ having access only to the previous decisions. To the best of our knowledge, our work is the first that controls FDR in this setting. This model was introduced by Foster and Stine \cite{alpha-investing} whose alpha-investing rule only controls mFDR in online manner. In order to compare different procedures, we develop lower bounds on the total discovery rate under the mixture model and prove that both LOND and LORD have nearly linear number of discoveries. We further propose adjustment to LOND to address arbitrary correlation among the $p$-values. Finally, we evaluate the performance of our procedures on both synthetic and real data comparing them with alpha-investing rule, Benjamin-Hochberg method and a Bonferroni procedure.

Citations (37)

View on Semantic Scholar