Question

Volcano plot: why is there big FC with big p-values?

0

Entering edit mode

5.1 years ago

i.am.filippov • 0

I'm looking at tutorial about analysing differential expression from microarray data. limma is used to detect differentially expressed genes.

Now if you look at:

here

the bigger log fold change corresponds to smaller p-value, i.e. bigger FC is more significant. But why would different genes at the same FC level have different p-values? How is the big spread explained? Does this question make sense?

Thanks!

R gene • 4.6k views

ADD COMMENT • link updated 5.1 years ago by Makplus T ▴ 100 • written 5.1 years ago by i.am.filippov • 0

1

Entering edit mode

Two simple explanations are the larger within-treatment variances (e.g. counts for four treatment 1 samples are 2,2,2,2; and counts for four treatment 2 samples are 8,0,0,0), or differences in counts (e.g. 1/2 or 100/200).

ADD REPLY • link 5.1 years ago by h.mon 35k

2

Entering edit mode

5.1 years ago

Makplus T ▴ 100

It seems you have the idea that bigger Fold-change expect to smaller p-value.
But P-value and Fold-change are not necessarily related, fold change just reflects mean change, then P-value is not only depended by mean but also variance. (for example, if you perform the two sample students t-test )

ADD COMMENT • link 5.1 years ago by Makplus T ▴ 100

score 9 · Accepted Answer · 2019-10-12

The smaller the counts of a gene (or whatever you measure) are, the more unreliable they are and the more prone these counts are to show large fold changes.

Lets have an example:

A gene had 10 counts in sampleA and 2 counts in sampleB. Makes a fold change of 5 right? Say another gene had 1000 counts in A and 200 in B, also FC = 5. Which is more reliable: I would say the second one. Imagine you have small fluctuations of the counts because of the inherent uncertainly / error rate of sequencing and the quantification method. Say the gene now had only 5 counts in A and 4 in B, FC is now 1.25 instead of 5. If the second gene had the same fluctuation so 995 in A and 202 in B, the FC is now 4,925742574257426, so still very close to 5. The high counts are more resistent to little fluctuations. => If the mean (so the average counts for the genes) is low, the fold changes are high (but unreliable). As far as I know this holds true for every kind of experiment in which quantities are measured.

Long story short: Low counts tend to show artificially high (and often false) fold changes, therefore the confidence in them is low and therefore p-values tend to be large. You would need more replicates to have the power to detect differential genes with low counts compared to genes with high counts. That is why statistical power is inherently greater for highly-expressed than lowly-expressed genes.