Question

Genes looking differential abundant are not accoring to DESeq2

0

Entering edit mode

3.0 years ago

robert.murphy ▴ 110

I have a metagenomic dataset crossing three time points from which I have mined CAZymes and am using DESeq2 to identify differentially abundant CAZymes from using trimmed mean depth generated my CoverM (very similar to Q2Q3 contig coverage). From this I am see results I do not understand in the form of enzymes that look differentially abundant but DESeq2 says the are not. below is an example. I would appriciate anyone who could enlighten me as to why this is the case.

When looking at a relative abundance* heatmap of a subset of the CAZyme data with those marked as differentially abundant across any time point I notice on the far right several enzymes which appear to the naked eye to be differentially abundant. Endo−1,4−beta−mannanase for example on the far right

Relative abundance heatmap

When exploring the DESeq2 output for that given enzyme I see the following:

#Between time point 1 and 2    
PreVsP_df[rownames(PreVsP_df) == "3.2.1.78",]
              baseMean log2FoldChange     lfcSE      stat    pvalue      padj
    3.2.1.78 0.5565868      0.2389833 0.8940182 0.2673137 0.7892277 0.8563372

#Between time point 1 and 3
PreVsF_df[rownames(PreVsF_df) == "3.2.1.78",]
          baseMean log2FoldChange     lfcSE      stat    pvalue      padj
3.2.1.78 0.5565868      -1.132044 0.9063071 -1.249074 0.2116381 0.3301718

As we see here there is no significant difference. However when I explore relative abundance values of this enzyme individually I do not understand why this is the case. Pre = time point 1, Post = time point 2 and Field = time point 3:

enter image description here

Deseq2 R • 1.5k views

ADD COMMENT • link updated 3.0 years ago by Istvan Albert 102k • written 3.0 years ago by robert.murphy ▴ 110

1

Entering edit mode

Can you show an MA-plot? Use the plotMA() function. The baseMean looks very low, that might be an issue here.

ADD REPLY • link 3.0 years ago by ATpoint 88k

0

Entering edit mode

Apologies for delayed response. Here is the plotMA for the whole DESeq2 results: all_dds

Here is the plotMA results results(dds_cazyme, contrast = c("Timepoint", "PRE", "POST")). That being timepoint 1 and 2:

prepost plotMA

ADD REPLY • link 3.0 years ago by robert.murphy ▴ 110

0

Entering edit mode

here is also the plotCounts() of the gene: enter image description here

ADD REPLY • link 3.0 years ago by robert.murphy ▴ 110

0

Entering edit mode

It seems to be low counts and quite some variability, so not enough that DESeq2 considers this as differential. By the way, I have no idea why this MA-plot looks so weird and asymmetric, but I recall mutliple threads in which the DESeq2 developer expressed serious concerns on even using DESeq2 for abundance analysis as he (afair) doubts that this is a good fit for a negative binomial that DESeq2 builds on.

ADD REPLY • link 3.0 years ago by ATpoint 88k

score 4 · Accepted Answer · 2022-04-29

4

Entering edit mode

3.0 years ago

Istvan Albert 102k

The method expects your counts to be integer numbers preferably with counts well above zero that can be modeled as a distribution.

In contrast, your numbers seem to have a very low base mean.

Relative abundance is sensitive to errors especially when one small number is divided by another small number.

The absolute error of measurement will affect smaller numbers more than larger ones.

Long story short the method is not designed for this type of data.

ADD COMMENT • link 3.0 years ago by Istvan Albert 102k

0

Entering edit mode

This makes sense, but it was my understanding that many paper do use it for this function.