Batch correction for RNA-seq didn't work with ComBat-seq tool
1
0
Entering edit mode
2.5 years ago
ChuYi • 0

Hello,

I was using ComBat-seq from the sva package to correct the batch effects for my RNA-seq samples, which were from two different batches but exactly the same platform.

Before doing this, I've checked my data for batch effects, here is the code I used and returned results:

library(DESeq2)
sample_info <- data.frame(sample = colnames(count_mat),
                            condition = factor(rep(c("pre_treat", "post_treat"),
                                                   times = 9)),
                            batch = factor(c(rep(1, 8), rep(2, 10))))
rownames(sample_info) <- sample_info$sample
dds <- DESeqDataSetFromMatrix(countData = count_mat,
                                colData = sample_info,
                                design= ~ condition+batch)
pca_dat <- plotPCA(DESeqTransform(dds), intgroup=c("condition", "batch"), 
                     returnData=TRUE)

enter image description here

I did the batch effects correction with ComBat-seq , and the input counts was the raw counts matrix, similarly, I also plotted the PCA:

gene_exp_adj <- ComBat_seq(counts = counts, batch = sample_info$batch, 
                             group = sample_info$condition)

but apparently, the results didn't change much: enter image description here

I also performed same analysis on my lncRNA datasets, and the results of before/after batch correction were very similar. Does anyone know why? Or is there another tool I could try? Any advice would be appreciated!

RNA-seq effects batch ComBat-seq • 3.3k views
ADD COMMENT
0
Entering edit mode

What do you mean when you say it didn't work. Was there an error, or was the batch effect not resolved?

ADD REPLY
0
Entering edit mode

The later case, the batch effects didn't seem to be resolved.

ADD REPLY
0
Entering edit mode

Are you sure there’s any actual batch effects to begin with? I’m not an RNAseq expert but it doesn’t seem like it.

ADD REPLY
0
Entering edit mode

Maybe the results for my lncRNA datasets is more obvious, nothing seems to have changed: before batch correction before batch correction

after batch correction

after batch correction

ADD REPLY
2
Entering edit mode
2.5 years ago
ATpoint 85k

First of all you should follow the DESeq2 manual and use plotPCA correctly. You are giving it explicitely a DESeqTransform object (the manual does not suggest that -- it also makes no sense) and the axis limits of the PCA indicate that data are neither log-transformed - and based on the code probably not normalized as well. Please read the manual, do exactly what it does for the PCA and repeat the plots. My guess is that you are doing PCA on raw counts which makes no sense at all. So manual => DESeqDataSet() => vst() => plotPCA().

ADD COMMENT
0
Entering edit mode

I've made a basic mistake, thank you so much⭐️⭐️

ADD REPLY
0
Entering edit mode

Is it fine now after applying these suggestions? If not feel free to ask for clarification.

ADD REPLY
0
Entering edit mode

Thank you for your enthusiastic answer! Well, after normalising the original data, I got the pca results as below:

gene datasets-before batch correction: enter image description here

gene datasets-after batch correction: enter image description here

Also, I tried removeBatchEffect function from limma package on gene datasets, but how to determine which one has better batch correction effect?

batch correction for gene datasets with limma: enter image description here

But combat_seq didn't handle my lncRNA datasets well.

lncRNA datasets-before batch correction enter image description here

lncRNA datasets-after batch correction enter image description here

So I also tried removeBatchEffect function from limma package, it did seem to work:

batch correction for lncRNA datasets with limma: enter image description here

Still not knowing the reason why lncRNA datasets have more obvious batch effect than mRNA datasets, since both of them were sequenced from the exact same source. And why combat_seq failed to correct batch effects for lncRNA datasets.

ADD REPLY

Login before adding your answer.

Traffic: 2365 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6