Question

meta analysis of p values from deseq2 output

0

Entering edit mode

9.4 years ago

tonja.r ▴ 600

I have several RNA-seq studies where the same null hypothesis was tested. I analyzed each study with DESeq and as output have p-values and FDR values. I would like to do a meta analysis. By default DESeq2 produces two-sided p-values. For combine.test() function in R I need one sided p-values. So, my idea is just to do divide FDR values by 2: and use them in combine.test(FDR/2). And to get back to two-sided test, I multiply by two the combined p values. Would it be theoretically the right approach?

RNA-Seq • 4.5k views

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 9.4 years ago by tonja.r ▴ 600

0

Entering edit mode

Hint: The one-sided value isn't half the two-sided value. For example, if a two-sided value is 0.01 then one of the one-sided values will be ~1 and the other significant. So you'd need to decide which side to take.

ADD REPLY • link 9.4 years ago by Devon Ryan 105k

0

Entering edit mode

I guess I am either misunderstanding you or I am initially totally wrong. Assume, z-scores are given (so, normal distribution), to calculate two-sided p-values one would do: two.sided.p = 2*pnorm(-abs(z)) and apply combine.test(two.sided.p/2), right?

From DESeq2 paper:

For significance testing, DESeq2 uses a Wald test: the shrunken estimate of LFC is divided by its standard error, resulting in a z-statistic, which is compared to a standard normal distribution.

So, I could divide the FDR values by 2 to get one-sided p-values, couldn't I?

ADD REPLY • link updated 5.4 years ago by Ram 45k • written 9.4 years ago by tonja.r ▴ 600

0

Entering edit mode

The one-sided p-value is half the two-sided one if the test statistics distribution is symmetric around 0: e.g. Assuming a Gaussian distribution, P(|x|>5)=P(x<-5)+P(x>5) and because of symmetry, P(x<-5)=P(x>5)=0.5*P(|x|>5). What I think Devon is referring to is that then the one-sided value of the other alternative hypothesis is 1-0.5*P(|x|>5) e.g. P(x<5)= 1-P(X>5).

ADD REPLY • link 9.4 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

What I think Devon is referring to is that then the one-sided value of the other alternative hypothesis is 1-0.5*P(x!=5) e.g. P(x<5)= 1-P(X>5).

It is referred to the second part of the question,namely to "And to get back to two-sided test, I multiply by two the combined p values.", isn't it? In this paper I found following:

After combining the P-values, if desired the resulting combined P can be again converted to a two-tailed test by multiplying it by two.

Or do you mean I need to take the log2FC to account for the direction? It the gene is up or down regulated?

ADD REPLY • link updated 5.4 years ago by Ram 45k • written 9.4 years ago by tonja.r ▴ 600

0

Entering edit mode

combine.test() implements Fisher's method and Stouffer's method to combine p-values. Fisher's statistic follows a chi-square distribution which is not symmetric so you can't multiply the resulting p-value by 2 in this case. With Stouffer's method, you can multiply the resulting p-value by 2 because you're dealing with a symmetric distribution (the Z transform statistic follows a normal distribution).

ADD REPLY • link 9.4 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Yes, that's exactly what I'm referring to, since there are two one-sided p-values, depending on the alternative hypothesis in question.

ADD REPLY • link 9.4 years ago by Devon Ryan 105k

0

Entering edit mode

So you'd need to decide which side to take.

If we divide a two tailed p-value from DESeq2 in two, are we thereby selecting the one-tailed p-value that corresponds to the alternative hypothesis of gene expression changing in the direction it did? Is this appropriate, selecting the alternative hypotheses that relate to the direction of change?

I am also trying to combine p values from multiple independent RNASeq datasets and would like to use Stouffer's method, but want to be sure of using the correct source of p-values.

ADD REPLY • link 4.8 years ago by volvicpellegrino1 • 0

0

Entering edit mode

Coming back to this, I think I have answered my own question.

I ran three versions of DESeq2's results() function, specifying "greaterAbs", "greater" and "less" as altHypothesis arguments. lfcThreshold was set to 0. I created a tibble of p values from each results object and depending on the direction of fold change, divided the p value of the "greater" or "less" column by the "greaterAbs" column. The result is 0.5 for all genes.

ADD REPLY • link 4.2 years ago by volvicpellegrino1 • 0