Question

NO FDR <0.05 in the whole dataset

0

Entering edit mode

4.4 years ago

camillab. ▴ 160

Hi !

I am trying to calculate the FDR for the p.values in my bulk-RNAseq dataset (unfiltered, so on 21011 genes total, the dataset is composed by 2 conditions and it has 3 samples/condition) but I am not particularly strong in stats so I am confused in the output (so apologies if the question is "stupid"). I did used qvalue package and here is the code:

library(qvalue)
qobj <- qvalue(p = test$p_value)
summary(qobj) #summary showing no significant genes
hist(qobj)

obtaining the following results (from summary(qobj) ):

Call:
qvalue(p = test$p_value)

pi0:    0.4549629   

Cumulative number of significant calls:

          <1e-04 <0.001 <0.01 <0.025 <0.05 <0.1
p-value        4     88   837   1858  3284 5292
q-value        0      0     0      0     0   38
local FDR      0      0     0      0     0    0
             <1
p-value   21011
q-value   21011
local FDR 20988

and this is the histogram

so I have 2 questions:

Is it possible that I do not get any significant value (not even 1) ? (I did also try p.adjust(test$p_value,method="BH") obtaining again 0 significant FDR)
Is it wrong to calculate the FDR only on a smaller group of genes?

Basically looking here blog it seems that my pvalues are Bimodal (scenario C) and he suggests that

Don’t apply false discovery rate control to these p-values yet.

and one of the options he suggests is to

Do all the p-values close to 1 belong to some pathological case? An example from my own field: RNA-Seq data, which consists of read counts per gene in each a variety of conditions, will sometimes include genes for which there are no reads in any condition. Some differential expression software will report a p-value of 1 for these genes. If you can find problematic cases like these, just filter them out beforehand (it’s not like you’re losing any information!)

And in my dataset unfiltered I have genes, as he mentioned, there are genes with no reads (better, it has been assigned 0.1). Do you, in summary, recommend leaving it or subset the genes or..?

thank you in advance for all the suggestions

Camilla

R RNA-Seq FDR qvalue • 1.7k views

ADD COMMENT • link updated 4.4 years ago by rpolicastro 13k • written 4.4 years ago by camillab. ▴ 160

1

Entering edit mode

in my bulk-RNAseq dataset (unfiltered, so on 21011 genes total, the dataset is composed by 2 conditions and it has 3 samples/condition)

I think your description is leaving an obvious gap in the details: how was the statistical test conducted? Please specify which approach was used, e.g. DESeq2, edgeR, limma, or something else? If your result is not from one of these, you should first run a state-of-the-art aproach and check again. For example, if you used a simple t.test this would be the wrong approach. The named packages already provide FDR corrected qvalues, thus, given a standard approach is taken, there is no need to adjust p.values on your own. Please investigate that first and provide more background on the analysis.

ADD REPLY • link 4.4 years ago by Michael 56k

score 1 · Answer 1 · 2021-02-23

1

Entering edit mode

4.4 years ago

ATpoint 88k

Same problem as here: A: p.adjusted p value Experiment is underpowered and you most likely use inappropriate statistics (=generating the pvalues) for RNA-seq. Not more to say, with the given analysis strategy and power you have no DEGs after correction, no matter how you do it, the code for BH is correct though.