Question

Correcting for batch effect in RNA-seq data

2

Entering edit mode

6.0 years ago

Rimma ▴ 30

I used DESeq2 to process RNA-seq data from different sources. And I found harsh batch effect when plotted PCA (different shapes of the figures represent 3 different batches, for example, ctr and PH.7d from different batches cluster apart):

enter image description here

I tried to remove it using limma package as described here:

colData
      sample   condition batch
1         100       PH.7d     1
..........
7          75         ctr     1
8  SRR5035380 hblast.10.5     2
..........
25 SRR5035397 hblast.18.5     2
26 SRR8437299         ctr     3
..........
37 SRR8437324       PH.7d     3

vsd<-vst(dds)
assay(vsd)<-limma::removeBatchEffect(assay(vsd),vsd$data1)
data2<-plotPCA(vsd, intgroup=c('condition','batch'),returnData=T)
data2<-as.data.frame(data2)
percentVar<- round(100*attr(data2,'percentVar'))
plot2<-qplot(PC1,PC2,color=condition,shape=batch,data=data2)

However, there is no changes when I plot the results:

enter image description here

What am I doing wrong?

Also, I tried to remove batch effect using design in DESeq:

ddsB=DESeqDataSetFromMatrix(countData = countData,colData = colData, design = ~batch+condition)

I'm getting this error:

Error in checkFullRank(modelMatrix) : 
the model matrix is not full rank, so the model cannot be fit as specified.
One or more variables or interaction terms in the design formula are linear
combinations of the others and must be removed.

Can somebody help me to solve it?

Thanks in advance!

RNA-Seq batch-effect • 4.8k views

ADD COMMENT • link updated 15 months ago by Ram 45k • written 6.0 years ago by Rimma ▴ 30

1

Entering edit mode

Are you sure that vsd$data1 corresponds to the vector encoding the batch variable? Seems to me it should be vsd$batch.

ADD REPLY • link 6.0 years ago by Friederike 9.0k

0

Entering edit mode

It looks like batch 2 doesn't contain any of the groups in batch 1 and 3, therefore it is not possible to correct for that batch. Are you sure there is at least one overlapping group in batch 2, that is also found in batch 1 and 3?

ADD REPLY • link 6.0 years ago by Benn 8.4k

score 1 · Answer 1 · 2019-08-07

1

Entering edit mode

6.0 years ago

ATpoint 88k

RNA-seq is strongly confounded by the kit and library preparation method from what I've seen. The confounding effect kight dominate the biological variability. The confounding effect probably dominates any kind of biological differences, see here for example a PCA that I made from five independent data sources, processed identically from the in silico side.

Edit: Check if correct use of batch removal attempts as Benn says below can limit the confounding effect.

enter image description here

ADD COMMENT • link 6.0 years ago by ATpoint 88k

0

Entering edit mode

But you can correct for batch when there are overlapping groups. However, I suspect that OP's batch 2 doesn't have any overlapping group...

ADD REPLY • link 6.0 years ago by Benn 8.4k

0

Entering edit mode

True, but to what extend. Do you have experience on how well this works. I mean "mild" batch effects like different culture conditions in the lab, samples taken on different days or different sequencing protocols might be correctable, but can you really "regress" out the effect of different kits and laboratories?

ADD REPLY • link 6.0 years ago by ATpoint 88k

0

Entering edit mode

I have only experience with removeBatchEffect() from edgeR/limma, they work fine, especially for visualization. Clearly the limma::removeBatchEffect code from OP did not work properly. Like Friederike is already suspecting.

ADD REPLY • link 6.0 years ago by Benn 8.4k