Question

Batch correction for RNA-seq using ComBat-seq for same group and batch

0

Entering edit mode

8 months ago

botloggy ▴ 10

I am trying to run ComBat_seq to correct for batch effects and then use DESeq2 to perform differential expression analysis. For context, I have a PBMC data from Day1 and Day6 which is group_id. The patient_id refers to the paired samples at Day1 and Day6. I did PCA analysis and found that there is a clear batch effect in Day1 and Day6 samples. Now, I want to correct the batch effect. But, when I run the batch correction method ComBat_seq I get errors.

Below is my code:

dds <- DESeqDataSetFromMatrix(cluster_counts,
                            colData = ei,
                            design = ~ group_id + patient_id)
corrected <- ComBat_seq(counts=counts(dds), batch = dds$batch_id, group = group_id)

Error:

Error: The covariate is confounded with batch! Remove the covariate and rerun ComBat_seq.

If I run without group parameter or setting it to NULL then I won't get error.

corrected <- ComBat_seq(counts=counts(dds), batch = dds$batch_id, group = NULL)
dds <- DESeqDataSetFromMatrix(corrected,
                            colData = ei,
                            design = ~ group_id + patient_id)

My question is how do I correct (or If it is possible to correct) for both the batch_id and group_id when they are both same. Are there any alternative approaches probably in DESeq2 pipeline to account for batch effects in this scenario.

SVA ComBat_seq ComBat DESeq2 RNA-seq • 1.1k views

ADD COMMENT • link 8 months ago by botloggy ▴ 10

0

Entering edit mode

The description is difficult to understand. Could you post the result of head(ei) please?

ADD REPLY • link 8 months ago by BioinfGuru ★ 2.1k

0

Entering edit mode

Sure, the head(ei) would look something as shown below @bioinfguru.

group_id sample_id batch_id patient_id

day1 A1_day1 batch1 A1

day1 A2_day1 batch1 A2

day1 A3_day1 batch1 A3

day6 A1_day6 batch2 A1

day6 A2_day6 batch2 A2

day6 A3_day6 batch2 A3

ADD REPLY • link 8 months ago by botloggy ▴ 10

0

Entering edit mode

I think what is happening here is group_id and batch_id are essentially the same thing. So just remove "group = group_id". Bear in mind though that if this is a time series experiment, then actually you should not be treating it as a batch effect.

To expand:

The batch_id column is a copy of group_id so should be removed. Your metadata is the same as:

day    patient sample
1        1     1.1
1        2     1.2
1        3     1.3
6        1     6.1
6        2     6.2
6        3     6.3

So it sounds like the PCA separated the groups by "day", but "day" is not a variable of interest to you so you want to remove "day" as a batch effect. So this is not a time series. Is that right?

On the other hand, if "day" is something that is important (i.e. this is a comparison to see how the samples change between day1 and day6) then the PCA did not identify a batch effect, it identified there is a clear change between day1 and day6.

ADD REPLY • link 8 months ago by BioinfGuru ★ 2.1k

0

Entering edit mode

So, the data I have is longitudinal with sample patients PBMC collected at different timepoints. For example, Day1, Day6, Day12, Day24. I just did pairwise comparison between Day1 and Day6 to begin with and found that there is a batch effect between Day1 and Day6. I think this might be attributed to sample collection that happens at different times. Is there any solution in this scenario?

ADD REPLY • link 8 months ago by botloggy ▴ 10

0

Entering edit mode

Ok .. so I didn't realise there were more time points. Can you post the full metadata (ei) and also the PCA with all samples included and colored or shaped by day. This doesn't sound like a batch effect.

ADD REPLY • link 8 months ago by BioinfGuru ★ 2.1k

0

Entering edit mode

There are more time points but I am focusing only on the first comparison (Day1 vs Day6) for which I just put the sample metadata above.

ADD REPLY • link 8 months ago by botloggy ▴ 10