Question

best criteria to find DE genes in RNA-seq analysis

0

Entering edit mode

8.6 years ago

hougiotaejut ▴ 30

Hi

I know I know I know that my question may seem silly to you. And I know I'm not as professional as you are, but please note that I HAVE RESEARCHED TO FIND THE BEST ANSWER, but I don't know if my keywords weren't appropriate because the more I searched on google and specifically on biostars, the more I feel confused.

I have a list of P-values, Log2FoldChanges and Standard Deviations (SD) of Log2FoldChanges for each gene in a time-course RNA-seq study. I'm going to find the DE genes based on the info I have been provided. According to my researches, some say that we should use the adjusted p-values to find the DEG, which means I should use the command "p.adjust" in R to adjust my current p-values. But I have also seen that some people use FDR and FC. On the other hand, everyone recommends a different criteria to find DEG. One says, "find DEG with adjusted p-values<0.1". Another one recommends using adjusted p-values<0.05.

I'm getting confused that,

1- which one is the best solution to find DEG? P-values, adjusted p-values, FDR, FC? I'm not sure whether FDR and FC could be used to find DEG. I just repeat what I have read.

2- whatever you recommend me in question number 1, what criteria do you recommend? 0.05? 0.1? etc.

3- if P-value (or its adjusted value) is enough, what is log2FoldChange and its SD good for? why is it provided beside P-values by the package I'm using?

I apologize if my question is so basic and may bother you. And thanks for your help.

P-value adjusted p-value RNA-seq • 12k views

ADD COMMENT • link updated 8.6 years ago by Devon Ryan 105k • written 8.6 years ago by hougiotaejut ▴ 30

2

Entering edit mode

adj.P < 0.05 is typical; you shouldn't need to adjust the p-values if this is done by the package you are using to call differential expression, be it DESeq2 or edgeR or limma or whatever. log2 FC will give you the expected direction of change. Some people filter their DEGs based on |log2_FC| > 0.5 or something but this is usually unnecessary in my experiments.

ADD REPLY • link 8.6 years ago by russhh 5.8k

0

Entering edit mode

Thank you so much for the info. The package gives me the p-values, not the adjusted ones, so I think I have to adjust them as you say adj p-values are typical. Is bonferroni adjustment appropriate?

ADD REPLY • link 8.6 years ago by hougiotaejut ▴ 30

1

Entering edit mode

Hi, Just for your information, the FDR is an adjusted pvalue. Bonferroni is a bit more stringent than FDR but at this point it shouldn't really matter which method you use.

ADD REPLY • link 8.6 years ago by Carlo Yague 9.0k

0

Entering edit mode

Oh, I get it now. FDR is the short term for FDR-adj-pvalue? I spotted an option in "p.adjust" command in R which could use the method "FDR" to adjust P-values. Good point. Thank you so much.

ADD REPLY • link 8.6 years ago by hougiotaejut ▴ 30

1

Entering edit mode

You could think of the cutoff for an adjusted p-value (of 0.05 or 0.1) as a cost function. It depends on what you want to do with the data downstream, and how "expensive" it is to tolerate false positive findings.

ADD REPLY • link 8.6 years ago by WouterDeCoster 48k

score 5 · Accepted Answer · 2017-01-03

5

Entering edit mode

8.6 years ago

Devon Ryan 105k

Adjusted p-value, further filtering by fold-change if needed (i.e., if you have way too many results to handle or there are far too many with small fold-changes).
Either 0.1 or 0.05, depending on what you want to do next, how much noise you can tolerate, and the number of significant genes you're getting.
A log2FC of 0.1 (as an example) is unlikely to be biologically relevant, so it can be useful to remove results that won't be useful.

ADD COMMENT • link 8.6 years ago by Devon Ryan 105k

1

Entering edit mode

Another useful way of doing this (log2 fc filtering) is changing null hypothesis "log2 fold changes are equal to zero" to 0.1 or 0.2. This will take care of point 3 in a statistical manner. I know this can be done in DESeq2 but not sure about other packages.

DESeq2::results(object = dds, lfcThreshold = 0.2)

ADD REPLY • link 8.6 years ago by poisonAlien ★ 3.2k

0

Entering edit mode

Thank you so much. I need to compare two models to check which one is more capable of detecting DE genes. So, I think adj p-values are enough as you say. right? because if I have understood you right, further filtering by fold change in case there are too many results, is for when I want to answer a biological question. I my case, comparing two DEG detection models, adj p-value is enough. Am I right? And is bonferroni adjustment appropriate?

ADD REPLY • link 8.6 years ago by hougiotaejut ▴ 30

2

Entering edit mode

If youre comparing two models "to see which one is more capable of detecting DE genes" you have to be careful of false-positives, and overall accuracy. You could make a terrible model that just reports p=0.0001 for every second gene and it will "be more capable of detecting DE genes" than any accurate system. Try to make a known-truth list of genes first.

ADD REPLY • link 8.6 years ago by karl.stamm 4.1k

1

Entering edit mode

What karl.stamm wrote. Regarding bonferroni correction, it's overly conservative. The default BH method in p.adjust() is superior.

ADD REPLY • link 8.6 years ago by Devon Ryan 105k