I am trying to run ComBat_seq to correct for batch effects and then use DESeq2 to perform differential expression analysis. For context, I have a PBMC data from Day1
and Day6
which is group_id
. The patient_id
refers to the paired samples at Day1
and Day6
. I did PCA analysis and found that there is a clear batch effect in Day1 and Day6 samples. Now, I want to correct the batch effect. But, when I run the batch correction method ComBat_seq
I get errors.
Below is my code:
dds <- DESeqDataSetFromMatrix(cluster_counts,
colData = ei,
design = ~ group_id + patient_id)
corrected <- ComBat_seq(counts=counts(dds), batch = dds$batch_id, group = group_id)
Error:
Error: The covariate is confounded with batch! Remove the covariate and rerun ComBat_seq.
If I run without group parameter or setting it to NULL
then I won't get error.
corrected <- ComBat_seq(counts=counts(dds), batch = dds$batch_id, group = NULL)
dds <- DESeqDataSetFromMatrix(corrected,
colData = ei,
design = ~ group_id + patient_id)
My question is how do I correct (or If it is possible to correct) for both the batch_id
and group_id
when they are both same. Are there any alternative approaches probably in DESeq2 pipeline to account for batch effects in this scenario.
The description is difficult to understand. Could you post the result of
head(ei)
please?Sure, the
head(ei)
would look something as shown below @bioinfguru.I think what is happening here is group_id and batch_id are essentially the same thing. So just remove "group = group_id". Bear in mind though that if this is a time series experiment, then actually you should not be treating it as a batch effect.
To expand:
The batch_id column is a copy of group_id so should be removed. Your metadata is the same as:
So it sounds like the PCA separated the groups by "day", but "day" is not a variable of interest to you so you want to remove "day" as a batch effect. So this is not a time series. Is that right?
On the other hand, if "day" is something that is important (i.e. this is a comparison to see how the samples change between day1 and day6) then the PCA did not identify a batch effect, it identified there is a clear change between day1 and day6.
So, the data I have is longitudinal with sample patients PBMC collected at different timepoints. For example, Day1, Day6, Day12, Day24. I just did pairwise comparison between Day1 and Day6 to begin with and found that there is a batch effect between Day1 and Day6. I think this might be attributed to sample collection that happens at different times. Is there any solution in this scenario?
Ok .. so I didn't realise there were more time points. Can you post the full metadata (ei) and also the PCA with all samples included and colored or shaped by day. This doesn't sound like a batch effect.
There are more time points but I am focusing only on the first comparison (Day1 vs Day6) for which I just put the sample metadata above.