New to NGS analysis, but that's the task I've been assigned. I have received NGS data that I am trying to decipher.
I’m attempting to learn what exactly is meant by "unadjusted p-value" and "FDR" in looking at comparison ttests of genes (the comparisons are between NGS of animals treated with drug or placebo). I understand the basic concepts, but not how to functionally make use of them. Most of the values seem fairly large (well over 0.1 for p-values, in the 0.1 to 0.9 range for FDR) when looking at data sets of ~20,000 to 40,000 members. My goal here is to determine a value for each that would allow me to gate on the genes with meaningful expression differences. Is there a specific value I should use as the boundary, or some way to calculate it based on the sample size or something?
This. I see so many people making the mistake of assuming a low p-value is a large effect size.
Isn't the way to combat that to just lower your threshold for calling something significant?
Never confuse statistical significance and biological relevance.
No. P-value is a measure of significance, and therefore more related to variation and sample consistency. If all the drug treated were at 102.1% expression plus or minus 0.001, this would have high certainty of difference without much biological relevance; compared to another gene with 300% plus or minus 50. As Devon said, use fdr to gate then sort for high fold change. They will be correlated..