Question

DESeq confusing results

0

Entering edit mode

3.0 years ago

leranwangcs ▴ 150

Hi,

I used DESeq on bacterial data for differential abundance analysis, trying to find significantly different ASVs between PBS and Infant groups. I got 20 significantly different ASVs (padj < 0.05), and 8 non-significantly different ASVs (pads > 0.05).

Then I wanted to make boxplots on each of those ASVs to see how those ASVs distributed in the two groups, (I also included "mother" when making these plots but "mother" group was not included in the DESeq analysis). But the results confused me. Below is the boxplots for 20 significantly different ASVs:

enter image description here

Below is the boxplots for the 8 non-significantly different ASVs: enter image description here

What I can see from the second plots is that, those ASVs are all SIGNIFICANTLY DISFFERENT between PBS and Infant groups.

I don't understand why DESeq detected them as "NON-SIGNIFICANTLY DIFFERENT"??

Below is my code:

# # subset phloseq to only keep Infant and PBS
ps0.infant.pbs <- physeqBacteria %>%
                  subset_samples(sample_type != "mother")


# DESeq on Infant and PBS, then find the non-differentially abundant taxas (nothing have been removed)
ds.all <- phyloseq_to_deseq2(ps0.infant.pbs, ~ sample_type)

#compute geomeans
geoMeans <- apply(counts(ds.all),1,gm_mean)

#estimate size factors
ds.all <- estimateSizeFactors(ds.all,geoMeans = geoMeans)

# run deseq
dds.all <- DESeq(ds.all,fitType = "local")

#get deseq result
res.dds.all <- results(dds.all,cooksCutoff = FALSE)


# make a dataframe on the deseq result
res.dds.all <- as.data.frame(res.dds.all[ which(res.dds.all$baseMean >  baseMean), ]) %>%
                        rownames_to_column("OTU")


# pull non-sig table
deseq.non.sig <- res.dds.all %>%
                     subset(padj >= 0.05)



# pull non-sig ASV
deseq.non.sig.ASV <- res.dds.all %>%
                     subset(padj >= 0.05) %>%
                     pull("OTU")


# pull non-sig table
deseq.sig <- res.dds.all %>%
                     subset(padj < 0.05)



# pull sig ASV
deseq.sig.ASV <- res.dds.all %>%
                     subset(padj < 0.05) %>%
                     pull("OTU")



# make ground truth plot ready count table
replace_counts = function(physeq, dds) {

  table.list <- list()

  dds_counts = counts(dds, normalized = TRUE)
  if (!identical(taxa_names(physeq), rownames(dds_counts))) {
    stop("OTU ids don't match")
  }


  otu_table(physeq) = otu_table(dds_counts, taxa_are_rows = TRUE)
  return(physeq)

  }


  # Make deseq ready object
  ds.all <- phyloseq_to_deseq2(physeqBacteria, ~sample_type)


  # # only use below code if Deseq throws errors
  # # calculate the geom means (a normalization method)

  gm_mean = function(x, na.rm=TRUE){
      exp(sum(log(x[x > 0]), na.rm=na.rm) / length(x))
    }

  # apply the geom means on count table
  geoMeans <- apply(counts(ds.all), 1, gm_mean)


  # estimate the size factors (median ratio method)
  # use sizeFactor() can check out the size factors
  ds.all <- estimateSizeFactors(ds.all, geoMeans = geoMeans)


  # core step of Deseq:differential expression analysis
  # betaPrior=TRUE: shrink non-informative LFCs to zero
  # betaPrior=FALSE: Default since v1.16 (v1.20 here). So the LFC shrinkage is not working here. (default)
  # fitType = "local":  when parametric curve doesn’t fit the observed dispersion mean relationship. (default is parametric)
  dds.all <- DESeq(ds.all,fitType = "local")


  # replace the count table in physeq to the normalized count table
  rlog.all <- replace_counts(physeqBacteria, dds.all)


  # test normalized count table
  df <-  counts(dds.all, normalized=TRUE)

  tax  = tax_table(as.matrix(taxa))
  otu  = otu_table(as.matrix(df),taxa_are_rows = TRUE) # rlog.all
  sam  = sample_data(rlog.all)
  otutax.deseq = phyloseq(otu, tax,sam)


# subset taxa based on the deseqASV
otutax.deseq <-  prune_taxa(taxa_names(otutax) %in% deseq.sig.ASV,otutax) #deseq.non.sig.ASV


# melt the deseq phylsoeq object
otutax.deseq.melt <- otutax.deseq %>%
                     psmelt()

otutax.deseq.melt$sample_type <- factor(otutax.deseq.melt$sample_type,levels = c("PBS","infant","mother"))

# make ground truth plot
p.deseq.groundTruth <- ggboxplot(otutax.deseq.melt, x="sample_type", y="Abundance", color = "Phylum") +
    geom_jitter(alpha = 0.7, width = 0.2, size = 1)+
    scale_y_log10() +
    facet_wrap(Family~ASVname)+
    labs(title = "Absolute Abundant of Differentially Abundant Taxa") +
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
    stat_compare_means(label = "p.signif", hide.ns = TRUE) +
    labs(color='Phylum')

Thanks so much! Leran

DESeq • 1.3k views

ADD COMMENT • link updated 3.0 years ago by Asaf 10k • written 3.0 years ago by leranwangcs ▴ 150

0

Entering edit mode

You should include your relevant DESeq2 code in the post.

ADD REPLY • link 3.0 years ago by rpolicastro 13k

0

Entering edit mode

Thanks for the advice! Code posted!

Leran

ADD REPLY • link 3.0 years ago by leranwangcs ▴ 150

0

Entering edit mode

Also pretty sure DESeq2 is recommended over DESeq now in almost all cases.

ADD REPLY • link 3.0 years ago by jared.andrews07 ★ 18k

score 0 · Answer 1 · 2021-12-16

Using DESeq2 for differential composition of ASVs might be risky, there might not be a wide-enough common cohort of bacteria to normalize the data, there are other methods like ANCOM which might perform better (but will have trouble with zero or low counts).

I think the problem in the plots are the zero values turning into -Inf and not presented. You can try and use:

+ scale_y_continuous(trans=scales::pseudo_log_trans(base = 10)

Instead of scale_y_log10 so the zeros will have a 0 value.