Comparing P-Values & FDR adjusted p-values between RNA-Seq experiments?
2
0
Entering edit mode
17 months ago
Saran ▴ 50

Hello,

I ran my RNA-Seq comparisons individually:

  1. Co-Infection RSV & Bacteria versus Control
  2. RSV Infection versus Control

I used the same parameters for the experiments and used EdgeR TO TMM-normalize and filter the counts based on expression. This filtering results in different genes within each experiment. I then use Limma Voom to do DGE analysis and this results in p-values and FDR adjusted p-value lists. I have now been told to compare the p-values and adjusted p-values between experiments but I am not sure this is appropriate as they are two separate experiments? We want to see if Co-Infection produces much higher significance in most of the genes. Is this inappropriate to compare and should I instead just compare the average expression?

Thanks, Sara

Limma EdgeR Voom RNA-Seq RNA • 1.7k views
ADD COMMENT
1
Entering edit mode

I feel like "much higher significance" is impossible to prove. You would want to see if the magnitude of foldchange increases/decreases or if a new set of genes are differentially regulated. But comparing pvals doesn't make sense to me.

ADD REPLY
0
Entering edit mode

Thank you, completely agree.

ADD REPLY
0
Entering edit mode

If you make a volcano plot for each of these experiments, you may be able to show them how the pvalues cannot compare, and at the same time maybe identify some interesting, actually relevant comparison.

ADD REPLY
1
Entering edit mode
17 months ago
Gordon Smyth ★ 7.7k

The approach I usually take is to make a scatterplot of genewise logFC for one experiment vs logFC for the other. I would include all expressed genes on the plot and perhaps color-code genes that are signficantly DE in one or both of the experiments. The 1-1 line can be added to the plot as a reference.

Another possibility is a scattterplot of moderated t-statistics instead of logFC. This plot is most meaningful if the two experiments have a similar number of replicates and statistical power.

ADD COMMENT
1
Entering edit mode
17 months ago
LauferVA 4.5k

I have an answer that subsumes most of what has been written, use the test statistic itself.

When it comes down to it, what are we doing when we filter by p-value, adjusted p-value, logFC, logFC stdErr, etc? We are making a heuristic that is intended to help pull out interesting data - that's all.

The problem is, these have different strengths and weaknesses - let's illustrate.

Problems with p-adj and p-val: For instance, let's say you download 3 studies of the same phenotype, then you find a gene that has a very low p-value in all studies. Great! right? Not necessarily... What if the p-value is low, but the logFC is positive in one, and negative in another?! Even though the p-val is significant, the gene might not mean anything if it is up in one study and down in the next - may be a fluke.

Problems with logFc. If you filter by logFC alone, and you don't include the LogFC StdErr, you may just be enriching for low-quality, highly variable data.... a logFC of 10 is meaningless if the logFC Std Error is 6. Indeed, in this case, the p-value would not be significant ... the ratio of the mean logFC to the std error is 10/6, so the test statistic is only 1.666, N.S.

For these reasons, I don't use pvalue padj or logFC alone - but I will make compound filters that use them together.

However, there is one metric that addresses all the problems together: the test statistic itself. (could be LRT or wald or score test). A Wald statistic is calculated by dividing the mean by the std. error (in this case logFC/logFC Std Err), and then it is used to calculate the p-value directly. As such, it is a go-between that relates all of the others.

  • it gives direction of effect, because logFC /logFC Std Err can be positive or negative.
  • It gives likelihood, because the value of the stat alone is sufficient to calculate the p-value.
  • it doesnt give the magnitude of the effect in real terms, but I usually eliminate results with abs(logFC) < 1 to begin with, so that matters less, because I know I am at least looking at things that have a ratio of 2:1...
ADD COMMENT

Login before adding your answer.

Traffic: 1531 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6