In statistical hypothesis testing, specifically multiple hypothesis testing, the q-value in the Storey procedure provides a means to control the positive false discovery rate (pFDR). [1] Just as the p-value gives the expected false positive rate obtained by rejecting the null hypothesis for any result with an equal or smaller p-value, the q-value gives the expected pFDR obtained by rejecting the null hypothesis for any result with an equal or smaller q-value [2].
In statistics, testing multiple hypotheses simultaneously using methods appropriate for testing single hypotheses tends to yield many false positives: the so-called multiple comparisons problem. [3] For example, assume that one were to test 1,000 null hypotheses, all of which are true, and (as is conventional in single hypothesis testing) to reject null hypotheses with a significance level of 0.05; due to random chance, one would expect 5% of the results to appear significant ( P < 0.05), yielding 50 false positives (rejections of the null hypothesis). [4] Since the 1950s, statisticians had been developing methods for multiple comparisons that reduced the number of false positives, such as controlling the family-wise error rate (FWER) using the Bonferroni correction, but these methods also increased the number of false negatives (i.e. reduced the statistical power). [3] In 1995, Yoav Benjamini and Yosef Hochberg proposed controlling the false discovery rate (FDR) as a more statistically powerful alternative to controlling the FWER in multiple hypothesis testing. [3] The pFDR and the q-value were introduced by John D. Storey in 2002 in order to improve upon a limitation of the FDR, namely that the FDR is not defined when there are no positive results. [1] [5]
Let there be a null hypothesis and an alternative hypothesis . Perform hypothesis tests; let the test statistics be i.i.d. random variables such that . That is, if is true for test (), then follows the null distribution ; while if is true (), then follows the alternative distribution . Let , that is, for each test, is true with probability and is true with probability . Denote the critical region (the values of for which is rejected) at significance level by . Let an experiment yield a value for the test statistic. The q-value of is formally defined as
That is, the q-value is the infimum of the pFDR if is rejected for test statistics with values . Equivalently, the q-value equals
which is the infimum of the probability that is true given that is rejected (the false discovery rate). [1]
The p-value is defined as
the infimum of the probability that is rejected given that is true (the false positive rate). Comparing the definitions of the p- and q-values, it can be seen that the q-value is the minimum posterior probability that is true. [1]
The q-value can be interpreted as the false discovery rate (FDR): the proportion of false positives among all positive results. Given a set of test statistics and their associated q-values, rejecting the null hypothesis for all tests whose q-value is less than or equal to some threshold ensures that the expected value of the false discovery rate is . [6]
Genome-wide analyses of differential gene expression involve simultaneously testing the expression of thousands of genes. Controlling the FWER (usually to 0.05) avoids excessive false positives (i.e. detecting differential expression in a gene that is not differentially expressed) but imposes a strict threshold for the p-value that results in many false negatives (many differentially expressed genes are overlooked). However, controlling the pFDR by selecting genes with significant q-values lowers the number of false negatives (increases the statistical power) while ensuring that the expected value of the proportion of false positives among all positive results is low (e.g. 5%). [6]
For example, suppose that among 10,000 genes tested, 1,000 are actually differentially expressed and 9,000 are not:
Note: the following is an incomplete list.
In statistical hypothesis testing, specifically multiple hypothesis testing, the q-value in the Storey procedure provides a means to control the positive false discovery rate (pFDR). [1] Just as the p-value gives the expected false positive rate obtained by rejecting the null hypothesis for any result with an equal or smaller p-value, the q-value gives the expected pFDR obtained by rejecting the null hypothesis for any result with an equal or smaller q-value [2].
In statistics, testing multiple hypotheses simultaneously using methods appropriate for testing single hypotheses tends to yield many false positives: the so-called multiple comparisons problem. [3] For example, assume that one were to test 1,000 null hypotheses, all of which are true, and (as is conventional in single hypothesis testing) to reject null hypotheses with a significance level of 0.05; due to random chance, one would expect 5% of the results to appear significant ( P < 0.05), yielding 50 false positives (rejections of the null hypothesis). [4] Since the 1950s, statisticians had been developing methods for multiple comparisons that reduced the number of false positives, such as controlling the family-wise error rate (FWER) using the Bonferroni correction, but these methods also increased the number of false negatives (i.e. reduced the statistical power). [3] In 1995, Yoav Benjamini and Yosef Hochberg proposed controlling the false discovery rate (FDR) as a more statistically powerful alternative to controlling the FWER in multiple hypothesis testing. [3] The pFDR and the q-value were introduced by John D. Storey in 2002 in order to improve upon a limitation of the FDR, namely that the FDR is not defined when there are no positive results. [1] [5]
Let there be a null hypothesis and an alternative hypothesis . Perform hypothesis tests; let the test statistics be i.i.d. random variables such that . That is, if is true for test (), then follows the null distribution ; while if is true (), then follows the alternative distribution . Let , that is, for each test, is true with probability and is true with probability . Denote the critical region (the values of for which is rejected) at significance level by . Let an experiment yield a value for the test statistic. The q-value of is formally defined as
That is, the q-value is the infimum of the pFDR if is rejected for test statistics with values . Equivalently, the q-value equals
which is the infimum of the probability that is true given that is rejected (the false discovery rate). [1]
The p-value is defined as
the infimum of the probability that is rejected given that is true (the false positive rate). Comparing the definitions of the p- and q-values, it can be seen that the q-value is the minimum posterior probability that is true. [1]
The q-value can be interpreted as the false discovery rate (FDR): the proportion of false positives among all positive results. Given a set of test statistics and their associated q-values, rejecting the null hypothesis for all tests whose q-value is less than or equal to some threshold ensures that the expected value of the false discovery rate is . [6]
Genome-wide analyses of differential gene expression involve simultaneously testing the expression of thousands of genes. Controlling the FWER (usually to 0.05) avoids excessive false positives (i.e. detecting differential expression in a gene that is not differentially expressed) but imposes a strict threshold for the p-value that results in many false negatives (many differentially expressed genes are overlooked). However, controlling the pFDR by selecting genes with significant q-values lowers the number of false negatives (increases the statistical power) while ensuring that the expected value of the proportion of false positives among all positive results is low (e.g. 5%). [6]
For example, suppose that among 10,000 genes tested, 1,000 are actually differentially expressed and 9,000 are not:
Note: the following is an incomplete list.