Question

How to remove batch effect from RNA-seq data?

3

Entering edit mode

9.4 years ago

alesssia ▴ 580

Dear all,

I received RNAseq gene expression data that show batch effects for several technical confounders. I am performing a differential expression analysis using DESeq2, and have tried to add these effects as parameters to my design (~batch1+batch2+condition), but some of them are also linear combinations of the others, resulting in a model matrix that is not full rank.

Someone suggests the usage of tools such as Combat or SVA, but I am aware that transformed values are no longer integer count and I wonder whether the usage of these values will affects the DESeq2 outcomes.

What is the correct way to remove batch effects in this case?

Thank you very much!

combat RNA-Seq SVA DEseq2 • 10k views

ADD COMMENT • link updated 2.1 years ago by Ram 44k • written 9.4 years ago by alesssia ▴ 580

Ram · Answer 1 · 2015-07-16

3

Entering edit mode

9.4 years ago

Michael Love ★ 2.6k

We have an example of using sva in the workflow: http://www.bioconductor.org/help/workflows/rnaseqGene/#batch

However, it sounds like you already know the batch. I don't understand how one would have multiple batch terms. What do these represent? What does your sample table look like (colData)?

ADD COMMENT • link updated 2.1 years ago by Ram 44k • written 9.4 years ago by Michael Love ★ 2.6k

0

Entering edit mode

I have GC content, date of the sequencing, and primer index. Unfortunately in some days only few (2-3) samples were sequenced.

ADD REPLY • link updated 2.1 years ago by Ram 44k • written 9.4 years ago by alesssia ▴ 580

0

Entering edit mode

It appears to me that GC content is a gene-level confounding variable, while date of sequencing and primer index are sample-level confounders. Therefore it is not clear to me how you would account for GC content at the sample-level (like the DESeq2 usage ~batch1+batch2+condition would indicate). However, DESeq2 allows you to control for gene-level confounders when estimating the size factors. Please see DESeq2 documentation for details on how to do that.

ADD REPLY • link updated 2.1 years ago by Ram 44k • written 9.4 years ago by lkmklsmn ▴ 980

0

Entering edit mode

Sorry if I answered too quickly without thinking. The main problem is that I have two conditions (cases and controls) and (in about ~80% of the samples) in the same day either only cases or only controls have been sequenced (and primer index makes things worse). I know that this is a really poor design, but that's the way it is. This yields to an model matrix that is not full rank when I introduce the batches in the design.

ADD REPLY • link updated 2.1 years ago by Ram 44k • written 9.4 years ago by alesssia ▴ 580