Why is DESeq2 normalization making my top feature have identical values across samples?
1
0
Entering edit mode
7 months ago
DNAngel ▴ 250

Hi all,

I'm using DESeq2 to normalize my counts dataset which has about 90 samples and 252 taxa. I will need this for WGCNA analyses which I have done many times before with species abundance datasets without issue.

However this time, I'm having such a weird problem that I cannot understand what is the underlying meaning. I have two datasets, bacteria and fungi and normally I will combine this to give me one complete microbiome dataset.

I have already identified top species (i.e., based on abundance, and their contribution scores).

What's strange here is that when I normalize the datasets separately, the numbers are okay and nothing looks weird. But when I normalize the combined bacteria+fungi dataset, no matter if i use estimateSizeFactors followed by estimateDispersions and then nbinomWaldTest, or if I use varianceStabilizingTransformation, the normalized count matrix keeps making my top species, a fungus, have identical values across all samples. This utlimately means it gets removed during WGCNA analysis during the cleaning steps.

Why is this happening?

Below are my different codes I've used to normalize my data.

  data_env <- data[,c(1:5)] # Environmental and sample info
  data_sp <- data[,-c(1:5)] # taxa counts
  data_sp.counts <- as.data.frame(t(data_sp)) # convert it so that taxa are rows, samples are columns

 data_env.coldata <- data.frame(rows=colnames(data_sp.counts), condition=as.factor(data$Species)) # where Species are the 10 different flower species sampled across 80 sites).
 data_env.coldata$rows <- as.character(data_env.coldata$rows)
 data_env.coldata$condition <- as.factor(data_env.coldata$condition)

# Normalization method 1
dds <- DESeqDataSetFromMatrix(countData = data_sp.counts, 
                          colData = data_env.coldata, 
                          design = ~ condition)

dds <- DESeq(dds)
dds <- estimateSizeFactors(dds)
dds <- estimateDispersions(dds)
dds <- nbinomWaldTest(dds) 

normalized_counts <- counts(dds, normalized=TRUE)


# Normalization method 2
 dds <- DESeqDataSetFromMatrix(countData = data_sp.counts, 
                          colData = data_env.coldata, 
                          design = ~ condition)
vsd <- varianceStabilizingTransformation(dds, blind=FALSE)
mat <- assay(vsd)

Either case, top fungus has almost identical values (they aren't identical if you look at every significant digit, but if rounded it is identical) and gets removed for WGCNA.

DESEQ2 WGCNA • 603 views
ADD COMMENT
0
Entering edit mode
7 months ago
LChart 4.5k

This could be an edge case that happens when all or nearly all genes have a sample with a 0 count - this can distort the size factors estimate. What happens if you run estimateSizeFactors(dds, type='iterate')?

ADD COMMENT
0
Entering edit mode

I got this warning when I ran that:

Error in estimateSizeFactorsIterate(object) : 
  iterative size factor normalization did not converge
ADD REPLY
1
Entering edit mode

Actually, calling estimateSizeFactors(dds, type="poscounts") worked. I didn't realize this was an option and it seems to work with 0 inflated data and good for datasets where there is a 0 in a sample in every gene/feature/taxa.

ADD REPLY

Login before adding your answer.

Traffic: 1571 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6