Question

Correcting batch effects between RPKM datasets?

4

Entering edit mode

10.0 years ago

JacobS ▴ 1000

I have two datasets, one with ~250 sample, and another with 7 samples. Both datasets are of RPKM values computed from human RNA-Seq. I don't have access to the primary reads files.

Is there a good way to batch-correct these datasets so that I can combine them and scan for expression signatures? I'm currently using an algorithm that creates a geometric average of the RPKM values for groups of genes that belong in a specific signature in order to compare samples, but the RPKM values of the ~250 sample dataset are on average much higher than the 7 sample dataset.

I've used ComBat in the past for the same predicament but with microarray expression data, and it worked perfectly. I'm looking for something analogous for RPKM expression data.

DGE RNA-Seq batch RPKM • 4.9k views

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 10.0 years ago by JacobS ▴ 1000

0

Entering edit mode

What are you going to do with the combined data? If you are going to do differential expression analysis, what are the groups?

ADD REPLY • link updated 2.8 years ago by Ram 45k • written 10.0 years ago by Sean Davis 27k

Ram · Answer 1 · 2015-08-07

2

Entering edit mode

10.0 years ago

Ying W ★ 4.3k

Have a look here.

Combat should still work for RNA-seq, it can be found in the SVA package.

You could also have a look at the following packages:

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 10.0 years ago by Ying W ★ 4.3k

0

Entering edit mode

Just note that batch effect correction is not always compatible with the experimental design and questions.

ADD REPLY • link 10.0 years ago by Sean Davis 27k

0

Entering edit mode

Hi,

Thanks. That helps. But I was wondering if the input data in Combat would be just the RPKM data or log2 of the RPKM data

ADD REPLY • link updated 5.7 years ago by Ram 45k • written 9.8 years ago by AB ▴ 390