Question

QuasiSeq vs. DESeq2 results

0

Entering edit mode

9.5 years ago

Assa Yeroslaviz ★ 1.9k

I have a different question but to the same topic, so I hope I don't need to start a new question.

I have analysed the same data now both with the two-step normalization using DESeq2 but also with the erccdashboard package.

In the DESeq analysis I have got almost 400 genes with an adjustedp-value<=0.05. Not one of these genes has a log2FC>0!

This is because most of the genes with a positive log2FC has 'NA' in the padj column (~90 from ~6000 genes with log2FC>0 are !='NA' in the results list. All the other genes are NA).

If I am doing the DESeq2 analysis without the independent filtering (by setting it in the results function to FALSE), I am getting only 12 genes with an adjusted p-value<=0.05.

As I was wondering why that is, I have also tested the erccdashboard with its QuasiSeq normalisation and differential expression analysis procedure. Here I am getting quite a different picture.

I have "only" 1720 genes with significant adjusted p-value <=0.05, but around 50% are positive (log2FC>0). Most of the statistically significant down-regulated genes are common to DESeq2 and erccdashboard, but none of the up-regulated genes in QuasiSeq.

Is there a reasonable explanation for such behaviour?

How come QuasiSeq calculate a significant p-values for different genes? Are the p-values in the erccdashboard result files being adjusted for multiple testing hypothesis?

Thanks
Assa

deseq2 normalisation quasiseq erccdashboard • 3.4k views

ADD COMMENT • link updated 23 months ago by Ram 44k • written 9.5 years ago by Assa Yeroslaviz ★ 1.9k

0

Entering edit mode

I think what is happening in your analysis with deseq2 is that the threshold for independent filtering is set unreasonably high. This happens if in general there is not much differential expression in your data. deseq2 tries to maximize the number of DEGs by excluding, in this case, a high proportion of genes from testing (having so a lighter multiple test correction). this is usually something you don't want. you said you performed a deseq2 analysis without independent filtering. that's the right way to improve your results, but you should also manually exclude some of the lowly expressed genes. otherwise, as you said, you get very few significant results. anyway, does the MA plot look good?

ADD REPLY • link 9.5 years ago by Martombo ★ 3.1k

0

Entering edit mode

Well,...

The MA plots looks similar, both with the tendency to be below 0. This happens whether I apply the independent filtering or not in the results function. here are the two plots, the first is with the ind. filtering, the second without.

screenshot-1

screenshot-2

I will try to run a manually independent filtering by removing the lowest 30% quantile or by removing all the genes with a summed read counts below 50/100 etc. I need to play a bit with the parameters to get better results.

ADD REPLY • link updated 23 months ago by Ram 44k • written 9.5 years ago by Assa Yeroslaviz ★ 1.9k

0

Entering edit mode

OK this situation is quite peculiar and not really similar to what I suggested before. was this produced with deseq2 size factor normalization or did you use some custom normalization factors? is a widespread downregulation something you expect in your data?

ADD REPLY • link updated 23 months ago by Ram 44k • written 9.5 years ago by Martombo ★ 3.1k

0

Entering edit mode

Yes to both your questions.

We have expected a global down-regulation of transcription. We have added Spike-ins (ERCC) exactly to prove that fact. For that reason, I have first calculated the size Factors for the ERCC subset of the data and than used these values for the global analysis of all the genes (without ERCC). This is what gave me this behaviour).

ADD REPLY • link updated 23 months ago by Ram 44k • written 9.5 years ago by Assa Yeroslaviz ★ 1.9k

2

Entering edit mode

If you expect global down-regulation, I would also recommend you turn off the beta prior, by setting betaPrior=FALSE when you call DESeq(). The beta prior is useful for most experiments for producing LFC which are robust across experiments, but it relies on zero-centered prior distribution for the LFC.

Another recommendation is to use LFC thresholds (see our paper and vignette), instead of LFC = 0 for the null. Thresholds are useful for defining meaningful sets, when the null is clearly false for most genes.

ADD REPLY • link updated 23 months ago by Ram 44k • written 9.5 years ago by Michael Love ★ 2.6k