I am looking into differential expression analyses of RNA Seq data. Having worked with arrays previously, I am quite used to the FDR to adjust for multiple testing. Thus far, I have always used 0.05 as the cutoff.
Looking into different ways to analyze the data, especially the DESeq2 package that several of you recommended, it seems to me that an adjusted p-value of 0.1 is the norm now.
I guess the answer is probably "it depends", but I can foresee reviewers questioning this...
Think about what "false discovery rate" actually means. Whatever your FDR cutoff, that's the proportion of genes that you call as differentially expressed that you expect really aren't differentially expressed. So if you call 500 genes as DE with an FDR cutoff of 0.1, you expect 50 of them to be false positives. It all comes down to tolerance for Type 1 vs. Type 2 errors. In my current project, we're using a very strict FDR of 0.01, because we want to be really sure that any gene we call is the real thing. If we had more of a tolerance for false positives in the name of discovery, we'd use 0.05 or 0.1. I've seen some very good projects that went all the way up to 0.25! But you have to calibrate it to the goals of the project.
One thing you should never do, IMO, is decide on the FDR cutoff based on how many positives you're getting. Decide on a cutoff a priori, and then the number of positives you get is, well, what you get. If you don't get anything at 0.1, sorry, that probably means your experiment just isn't producing significant results. If you get a bunch more than you were expecting at 0.05, that means your experimental condition is producing more DE than you thought. Either way, adjusting the cutoff after the fact is closely related to "p-hacking," and it's a terrible practice.
The second paragraph is just plain wrong for NGS and DE analysis. You should operationally define a cut-off as an informed statistician. This is not p-hacking or inappropriate data mining if you know what you are doing. To suggest that one needs to know exactly how to analyze one's data beforehand is antiquidated and not applicable to biological studies. Nature does not conform to an analysis tool and no analysis tool, especially for DE, is perfected. I work with human tissue and many diseases where we know certain markers are elevated. A predefined FDR may be overly stringent or overly permissive; strictly sticking to "well I said 0.05 with BH, so i better publish with that" instead of looking into why the FDR may not be appropriate is being lazy about your analysis.
Know your tools, know your math, know your data. If you don't have the time or ability to do this, you have no business publishing DE results.
I disagree, choosing your FDR cut-off after the results are known is no different (in practical terms) to choosing your critical p-value threshold after the results are known. It will help you, conciously or unconciously, to claim associations where there are none.
IMO control at FDR<0.05 should be the standard unless you have a very good reason to move it, then for situations where most hypotheses are false the FWER is also controlled at close to 5%. So our usual standards for false positive 'experiments' are being preserved.
I totally agree with Daniel's first paragraph but not second. There are many mendelian diseases (like Rett Syndrome) where the phenotype is extremely different however, the number of diff expressed genes between a WT and KO model organism is very small (~20 to 30 max with FDR < 0.05). Then I think one needs to come up with an ideal cut off such that a biologist has a significant number of genes that they would like to validate. I know its not ideal but not every dataset has fold change similar to cancer datasets.
You can't look at the results and then decide what you think is significant in my opinion, but you raise an important point that validation is a requirement, ideally in an independent cohort.
Getting rather off topic but you caught my interest with example of Rett Syndrome. In which tissue do you see that small difference?
In most of the brain tissues like Hypothalamus, cerebellum, straitum and dentate gyrus. Typically, people use anova rather then limma. Limma gives you around 20-30 genes where as anova gives you 100-200 genes. The transcriptomic changes are very small and most of diff expressed genes between WT and KO has FC difference of 20%. I had the same philosophy as yours but my little experience says it doesn't work for neurological disorders and psychiatric related datasets.
I don't believe there's anything wrong with using a more sensitive test in cases where the effect size is going to be small (such as ANOVA vs. limma in this case). I do believe that you should still set your FDR cutoff in advance.
Is that the case in small sample sizes or in larger also? FYI, a side project of me involves transcriptomics in Frontotemporal dementia (alas in lymphoblast cells).
You may use FDR or 0.1 if the number of diff. expressed genes (DEGs) from DESeq2 is not large (>100 or more). Typically FDR of 0.1 means that there is a chance that 10% of the genes are not false positive i.e. if 100 genes are called DEGs then about 10 genes are false positive.
However, if the number of DEGs is large (based on FDR < 0.05 or FDR < 0.1) or their p-values are very small, then accordingly tune your FDR parameter for DEGs analysis.
Take home message: It all depends upon the number of genes you are getting after the analysis and how many genes you or biologist will require to do validation.
The second paragraph is just plain wrong for NGS and DE analysis. You should operationally define a cut-off as an informed statistician. This is not p-hacking or inappropriate data mining if you know what you are doing. To suggest that one needs to know exactly how to analyze one's data beforehand is antiquidated and not applicable to biological studies. Nature does not conform to an analysis tool and no analysis tool, especially for DE, is perfected. I work with human tissue and many diseases where we know certain markers are elevated. A predefined FDR may be overly stringent or overly permissive; strictly sticking to "well I said 0.05 with BH, so i better publish with that" instead of looking into why the FDR may not be appropriate is being lazy about your analysis.
Know your tools, know your math, know your data. If you don't have the time or ability to do this, you have no business publishing DE results.
I disagree, choosing your FDR cut-off after the results are known is no different (in practical terms) to choosing your critical p-value threshold after the results are known. It will help you, conciously or unconciously, to claim associations where there are none.
IMO control at FDR<0.05 should be the standard unless you have a very good reason to move it, then for situations where most hypotheses are false the FWER is also controlled at close to 5%. So our usual standards for false positive 'experiments' are being preserved.
For the second paragraph I would like to upvote this more than once.
Hah! Thank you. All I can say is, it's hard-won knowledge.
I totally agree with Daniel's first paragraph but not second. There are many mendelian diseases (like Rett Syndrome) where the phenotype is extremely different however, the number of diff expressed genes between a WT and KO model organism is very small (~20 to 30 max with FDR < 0.05). Then I think one needs to come up with an ideal cut off such that a biologist has a significant number of genes that they would like to validate. I know its not ideal but not every dataset has fold change similar to cancer datasets.
You can't look at the results and then decide what you think is significant in my opinion, but you raise an important point that validation is a requirement, ideally in an independent cohort.
Getting rather off topic but you caught my interest with example of Rett Syndrome. In which tissue do you see that small difference?
In most of the brain tissues like Hypothalamus, cerebellum, straitum and dentate gyrus. Typically, people use anova rather then limma. Limma gives you around 20-30 genes where as anova gives you 100-200 genes. The transcriptomic changes are very small and most of diff expressed genes between WT and KO has FC difference of 20%. I had the same philosophy as yours but my little experience says it doesn't work for neurological disorders and psychiatric related datasets.
I don't believe there's anything wrong with using a more sensitive test in cases where the effect size is going to be small (such as ANOVA vs. limma in this case). I do believe that you should still set your FDR cutoff in advance.
Is that the case in small sample sizes or in larger also? FYI, a side project of me involves transcriptomics in Frontotemporal dementia (alas in lymphoblast cells).
WouterDeCoster: Mostly in small sample size where n = 4 or 5 for each genotype (in a model organism such as mice).