I have a different question but to the same topic, so I hope I don't need to start a new question.
I have analysed the same data now both with the two-step normalization using DESeq2 but also with the erccdashboard package.
In the DESeq analysis I have got almost 400 genes with an adjustedp-value<=0.05
. Not one of these genes has a log2FC>0
!
This is because most of the genes with a positive log2FC has 'NA' in the padj column (~90 from ~6000 genes with log2FC>0 are !='NA' in the results list. All the other genes are NA).
If I am doing the DESeq2 analysis without the independent filtering (by setting it in the results
function to FALSE
), I am getting only 12 genes with an adjusted p-value<=0.05
.
As I was wondering why that is, I have also tested the erccdashboard with its QuasiSeq normalisation and differential expression analysis procedure. Here I am getting quite a different picture.
I have "only" 1720 genes with significant adjusted p-value <=0.05
, but around 50% are positive (log2FC>0
). Most of the statistically significant down-regulated genes are common to DESeq2 and erccdashboard, but none of the up-regulated genes in QuasiSeq.
Is there a reasonable explanation for such behaviour?
How come QuasiSeq calculate a significant p-values for different genes? Are the p-values in the erccdashboard result files being adjusted for multiple testing hypothesis?
Thanks
Assa
Well,...
The MA plots looks similar, both with the tendency to be below 0. This happens whether I apply the independent filtering or not in the
results
function. here are the two plots, the first is with the ind. filtering, the second without.I will try to run a manually independent filtering by removing the lowest 30% quantile or by removing all the genes with a summed read counts below 50/100 etc. I need to play a bit with the parameters to get better results.
OK this situation is quite peculiar and not really similar to what I suggested before. was this produced with deseq2 size factor normalization or did you use some custom normalization factors? is a widespread downregulation something you expect in your data?
Yes to both your questions.
We have expected a global down-regulation of transcription. We have added Spike-ins (ERCC) exactly to prove that fact. For that reason, I have first calculated the size Factors for the ERCC subset of the data and than used these values for the global analysis of all the genes (without ERCC). This is what gave me this behaviour).
If you expect global down-regulation, I would also recommend you turn off the beta prior, by setting
betaPrior=FALSE
when you callDESeq()
. The beta prior is useful for most experiments for producing LFC which are robust across experiments, but it relies on zero-centered prior distribution for the LFC.Another recommendation is to use LFC thresholds (see our paper and vignette), instead of
LFC = 0
for the null. Thresholds are useful for defining meaningful sets, when the null is clearly false for most genes.