Hi,
I used DESeq on bacterial data for differential abundance analysis, trying to find significantly different ASVs between PBS and Infant groups. I got 20 significantly different ASVs (padj < 0.05), and 8 non-significantly different ASVs (pads > 0.05).
Then I wanted to make boxplots on each of those ASVs to see how those ASVs distributed in the two groups, (I also included "mother" when making these plots but "mother" group was not included in the DESeq analysis). But the results confused me. Below is the boxplots for 20 significantly different ASVs:
Below is the boxplots for the 8 non-significantly different ASVs:
What I can see from the second plots is that, those ASVs are all SIGNIFICANTLY DISFFERENT between PBS and Infant groups.
I don't understand why DESeq detected them as "NON-SIGNIFICANTLY DIFFERENT"??
Below is my code:
# # subset phloseq to only keep Infant and PBS
ps0.infant.pbs <- physeqBacteria %>%
subset_samples(sample_type != "mother")
# DESeq on Infant and PBS, then find the non-differentially abundant taxas (nothing have been removed)
ds.all <- phyloseq_to_deseq2(ps0.infant.pbs, ~ sample_type)
#compute geomeans
geoMeans <- apply(counts(ds.all),1,gm_mean)
#estimate size factors
ds.all <- estimateSizeFactors(ds.all,geoMeans = geoMeans)
# run deseq
dds.all <- DESeq(ds.all,fitType = "local")
#get deseq result
res.dds.all <- results(dds.all,cooksCutoff = FALSE)
# make a dataframe on the deseq result
res.dds.all <- as.data.frame(res.dds.all[ which(res.dds.all$baseMean > baseMean), ]) %>%
rownames_to_column("OTU")
# pull non-sig table
deseq.non.sig <- res.dds.all %>%
subset(padj >= 0.05)
# pull non-sig ASV
deseq.non.sig.ASV <- res.dds.all %>%
subset(padj >= 0.05) %>%
pull("OTU")
# pull non-sig table
deseq.sig <- res.dds.all %>%
subset(padj < 0.05)
# pull sig ASV
deseq.sig.ASV <- res.dds.all %>%
subset(padj < 0.05) %>%
pull("OTU")
# make ground truth plot ready count table
replace_counts = function(physeq, dds) {
table.list <- list()
dds_counts = counts(dds, normalized = TRUE)
if (!identical(taxa_names(physeq), rownames(dds_counts))) {
stop("OTU ids don't match")
}
otu_table(physeq) = otu_table(dds_counts, taxa_are_rows = TRUE)
return(physeq)
}
# Make deseq ready object
ds.all <- phyloseq_to_deseq2(physeqBacteria, ~sample_type)
# # only use below code if Deseq throws errors
# # calculate the geom means (a normalization method)
gm_mean = function(x, na.rm=TRUE){
exp(sum(log(x[x > 0]), na.rm=na.rm) / length(x))
}
# apply the geom means on count table
geoMeans <- apply(counts(ds.all), 1, gm_mean)
# estimate the size factors (median ratio method)
# use sizeFactor() can check out the size factors
ds.all <- estimateSizeFactors(ds.all, geoMeans = geoMeans)
# core step of Deseq:differential expression analysis
# betaPrior=TRUE: shrink non-informative LFCs to zero
# betaPrior=FALSE: Default since v1.16 (v1.20 here). So the LFC shrinkage is not working here. (default)
# fitType = "local": when parametric curve doesn’t fit the observed dispersion mean relationship. (default is parametric)
dds.all <- DESeq(ds.all,fitType = "local")
# replace the count table in physeq to the normalized count table
rlog.all <- replace_counts(physeqBacteria, dds.all)
# test normalized count table
df <- counts(dds.all, normalized=TRUE)
tax = tax_table(as.matrix(taxa))
otu = otu_table(as.matrix(df),taxa_are_rows = TRUE) # rlog.all
sam = sample_data(rlog.all)
otutax.deseq = phyloseq(otu, tax,sam)
# subset taxa based on the deseqASV
otutax.deseq <- prune_taxa(taxa_names(otutax) %in% deseq.sig.ASV,otutax) #deseq.non.sig.ASV
# melt the deseq phylsoeq object
otutax.deseq.melt <- otutax.deseq %>%
psmelt()
otutax.deseq.melt$sample_type <- factor(otutax.deseq.melt$sample_type,levels = c("PBS","infant","mother"))
# make ground truth plot
p.deseq.groundTruth <- ggboxplot(otutax.deseq.melt, x="sample_type", y="Abundance", color = "Phylum") +
geom_jitter(alpha = 0.7, width = 0.2, size = 1)+
scale_y_log10() +
facet_wrap(Family~ASVname)+
labs(title = "Absolute Abundant of Differentially Abundant Taxa") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
stat_compare_means(label = "p.signif", hide.ns = TRUE) +
labs(color='Phylum')
Thanks so much! Leran
You should include your relevant DESeq2 code in the post.
Thanks for the advice! Code posted!
Leran
Also pretty sure DESeq2 is recommended over DESeq now in almost all cases.