Outlier after DEseq2 in meta-analysis
1
1
Entering edit mode
5 weeks ago
a.stef.44 ▴ 10

I am performing a meta-analysis and used ComBat-Seq for batch adjustment. The PCA plot indicated good data homogenization. I performed DESeq2, including tumor purity as a covariate. However, every time I try to visualize the top 10 DEGs, I notice that one sample consistently appears as an outlier. I attempted to exclude this sample from the analysis, but when I re-run DESeq2 and visualize the results, another sample becomes an outlier. Essentially, regardless of what I do, one sample always shows the highest intensity in visualized the dds.norm data.

Do you know why this is happening. I didn't realized the same effect without including tumor.purity as covariance (maybe it still exist but not so obvious).

Is it expected to be happened? Is there anything I can do?

Barplot of top 2 DEGs

I am comparing R/NR and I am expecting to see difference between two group, In this case it seems that whole statistics is driven by one outlier.

Code I used:

dds0 <- DESeqDataSetFromMatrix(countData = count.table, colData = Pheno, design = ~Tumor.purity+ Pathological.response)
dds.norm <-  estimateSizeFactors(dds0)
sizeFactors(dds.norm)
## Performing estimation of dispersion parameter
dds.disp <- estimateDispersions(dds.norm)
alpha <- 0.0001
wald.test <- nbinomWaldTest(dds.disp)
res.DESeq2 <- results(wald.test, alpha=alpha, pAdjustMethod="BH")

#############################################################################################
png("Normalized counts plot.4.png", width=50, height=15, units="in", res=600)

par(mfrow = c(3, 3))  # 2 reda, 5 kolona
par(mar = c(4, 4, 8, 2))

gn.most.sign <- list()

for (i in 1:9) {
  gn.most.sign[[i]] <- rownames(res.DESeq2.pvalue.05.lfc.1)[i]
  gn.most.diff.val <- counts(dds.norm, normalized=T)[gn.most.sign[[i]],]
  barplot(gn.most.diff.val, col=Pheno$Color2,  main=gn.most.sign[[i]], las=2, cex.names=0.5,)
}

dev.off()
meta-analysis rna-seq outlier DESeq2 • 342 views
ADD COMMENT
1
Entering edit mode
5 weeks ago

Difficult to know without a deep dive into the colData and the counts. Its worth noting that DESeq2 automatically excludes outliers where it can. However, its possible that these outliers are are connected to the adjustment for tumour purify, and you should be looking at the counts after correction for tumour purity, or at least viewing tumour purify alongside.

One possibility is to take the DESeq2 normalised count matrix, and pass it through limma's removeBatchEffect to remove the batch effect before plotting. Personally, it also use the rlogs or vst normalised data for this sort of thing rather than the raw counts.

Another possbility that would might make it clear what was happening would be to plot the counts against the tumour purify for each of your sig genes., coloring by sample condition.

ADD COMMENT
0
Entering edit mode

Dear i.sudbery,

Thank you for your response and valuable input! I had some additional thoughts. As I am using the ESTIMATE algorithm to calculate tumor purity, do you think it's a significant assumption to consider the algorithm 100% accurate, and should it be included as a covariate in the DE analysis? I’m not entirely sure if I should include it or not. On one hand, it could help account for heterogeneity caused by differences in tumor purity, but on the other hand, it might introduce additional bias.

Thank you in advance, Aleksandra

ADD REPLY

Login before adding your answer.

Traffic: 1847 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6