Hi !
I am trying to calculate the FDR for the p.values in my bulk-RNAseq dataset (unfiltered, so on 21011 genes total, the dataset is composed by 2 conditions and it has 3 samples/condition) but I am not particularly strong in stats so I am confused in the output (so apologies if the question is "stupid"). I did used qvalue
package and here is the code:
library(qvalue)
qobj <- qvalue(p = test$p_value)
summary(qobj) #summary showing no significant genes
hist(qobj)
obtaining the following results (from summary(qobj)
):
Call:
qvalue(p = test$p_value)
pi0: 0.4549629
Cumulative number of significant calls:
<1e-04 <0.001 <0.01 <0.025 <0.05 <0.1
p-value 4 88 837 1858 3284 5292
q-value 0 0 0 0 0 38
local FDR 0 0 0 0 0 0
<1
p-value 21011
q-value 21011
local FDR 20988
and this is the histogram
so I have 2 questions:
Is it possible that I do not get any significant value (not even 1) ? (I did also try
p.adjust(test$p_value,method="BH")
obtaining again 0 significant FDR)Is it wrong to calculate the FDR only on a smaller group of genes?
Basically looking here blog it seems that my pvalues are Bimodal (scenario C) and he suggests that
Don’t apply false discovery rate control to these p-values yet.
and one of the options he suggests is to
Do all the p-values close to 1 belong to some pathological case? An example from my own field: RNA-Seq data, which consists of read counts per gene in each a variety of conditions, will sometimes include genes for which there are no reads in any condition. Some differential expression software will report a p-value of 1 for these genes. If you can find problematic cases like these, just filter them out beforehand (it’s not like you’re losing any information!)
And in my dataset unfiltered I have genes, as he mentioned, there are genes with no reads (better, it has been assigned 0.1). Do you, in summary, recommend leaving it or subset the genes or..?
thank you in advance for all the suggestions
Camilla
I think your description is leaving an obvious gap in the details: how was the statistical test conducted? Please specify which approach was used, e.g. DESeq2, edgeR, limma, or something else? If your result is not from one of these, you should first run a state-of-the-art aproach and check again. For example, if you used a simple t.test this would be the wrong approach. The named packages already provide FDR corrected qvalues, thus, given a standard approach is taken, there is no need to adjust p.values on your own. Please investigate that first and provide more background on the analysis.