Entering edit mode
9.4 years ago
tonja.r
▴
600
I have several RNA-seq studies where the same null hypothesis was tested. I analyzed each study with DESeq and as output have p-values and FDR values. I would like to do a meta analysis. By default DESeq2 produces two-sided p-values. For combine.test()
function in R I need one sided p-values. So, my idea is just to do divide FDR values by 2: and use them in combine.test(FDR/2)
. And to get back to two-sided test, I multiply by two the combined p values. Would it be theoretically the right approach?
Hint: The one-sided value isn't half the two-sided value. For example, if a two-sided value is 0.01 then one of the one-sided values will be ~1 and the other significant. So you'd need to decide which side to take.
I guess I am either misunderstanding you or I am initially totally wrong. Assume, z-scores are given (so, normal distribution), to calculate two-sided p-values one would do:
two.sided.p = 2*pnorm(-abs(z))
and applycombine.test(two.sided.p/2)
, right?From DESeq2 paper:
So, I could divide the FDR values by 2 to get one-sided p-values, couldn't I?
The one-sided p-value is half the two-sided one if the test statistics distribution is symmetric around 0: e.g. Assuming a Gaussian distribution, P(|x|>5)=P(x<-5)+P(x>5) and because of symmetry, P(x<-5)=P(x>5)=0.5*P(|x|>5). What I think Devon is referring to is that then the one-sided value of the other alternative hypothesis is 1-0.5*P(|x|>5) e.g. P(x<5)= 1-P(X>5).
It is referred to the second part of the question,namely to "And to get back to two-sided test, I multiply by two the combined p values.", isn't it? In this paper I found following:
Or do you mean I need to take the log2FC to account for the direction? It the gene is up or down regulated?
combine.test() implements Fisher's method and Stouffer's method to combine p-values. Fisher's statistic follows a chi-square distribution which is not symmetric so you can't multiply the resulting p-value by 2 in this case. With Stouffer's method, you can multiply the resulting p-value by 2 because you're dealing with a symmetric distribution (the Z transform statistic follows a normal distribution).
Yes, that's exactly what I'm referring to, since there are two one-sided p-values, depending on the alternative hypothesis in question.
If we divide a two tailed p-value from DESeq2 in two, are we thereby selecting the one-tailed p-value that corresponds to the alternative hypothesis of gene expression changing in the direction it did? Is this appropriate, selecting the alternative hypotheses that relate to the direction of change?
I am also trying to combine p values from multiple independent RNASeq datasets and would like to use Stouffer's method, but want to be sure of using the correct source of p-values.
Coming back to this, I think I have answered my own question.
I ran three versions of DESeq2's
results()
function, specifying"greaterAbs"
,"greater"
and"less"
asaltHypothesis
arguments.lfcThreshold
was set to 0. I created a tibble of p values from each results object and depending on the direction of fold change, divided the p value of the "greater" or "less" column by the "greaterAbs" column. The result is 0.5 for all genes.