Question

Outlier after DEseq2 in meta-analysis

1

Entering edit mode

5 months ago

a.stef.44 ▴ 10

I am performing a meta-analysis and used ComBat-Seq for batch adjustment. The PCA plot indicated good data homogenization. I performed DESeq2, including tumor purity as a covariate. However, every time I try to visualize the top 10 DEGs, I notice that one sample consistently appears as an outlier. I attempted to exclude this sample from the analysis, but when I re-run DESeq2 and visualize the results, another sample becomes an outlier. Essentially, regardless of what I do, one sample always shows the highest intensity in visualized the dds.norm data.

Do you know why this is happening. I didn't realized the same effect without including tumor.purity as covariance (maybe it still exist but not so obvious).

Is it expected to be happened? Is there anything I can do?

Barplot of top 2 DEGs

I am comparing R/NR and I am expecting to see difference between two group, In this case it seems that whole statistics is driven by one outlier.

Code I used:

dds0 <- DESeqDataSetFromMatrix(countData = count.table, colData = Pheno, design = ~Tumor.purity+ Pathological.response)
dds.norm <-  estimateSizeFactors(dds0)
sizeFactors(dds.norm)
## Performing estimation of dispersion parameter
dds.disp <- estimateDispersions(dds.norm)
alpha <- 0.0001
wald.test <- nbinomWaldTest(dds.disp)
res.DESeq2 <- results(wald.test, alpha=alpha, pAdjustMethod="BH")

#############################################################################################
png("Normalized counts plot.4.png", width=50, height=15, units="in", res=600)

par(mfrow = c(3, 3))  # 2 reda, 5 kolona
par(mar = c(4, 4, 8, 2))

gn.most.sign <- list()

for (i in 1:9) {
  gn.most.sign[[i]] <- rownames(res.DESeq2.pvalue.05.lfc.1)[i]
  gn.most.diff.val <- counts(dds.norm, normalized=T)[gn.most.sign[[i]],]
  barplot(gn.most.diff.val, col=Pheno$Color2,  main=gn.most.sign[[i]], las=2, cex.names=0.5,)
}

dev.off()

meta-analysis rna-seq outlier DESeq2 • 521 views

ADD COMMENT • link 5 months ago by a.stef.44 ▴ 10

score 1 · Answer 1 · 2024-10-14

Difficult to know without a deep dive into the colData and the counts. Its worth noting that DESeq2 automatically excludes outliers where it can. However, its possible that these outliers are are connected to the adjustment for tumour purify, and you should be looking at the counts after correction for tumour purity, or at least viewing tumour purify alongside.

One possibility is to take the DESeq2 normalised count matrix, and pass it through limma's removeBatchEffect to remove the batch effect before plotting. Personally, it also use the rlogs or vst normalised data for this sort of thing rather than the raw counts.

Another possbility that would might make it clear what was happening would be to plot the counts against the tumour purify for each of your sig genes., coloring by sample condition.