Dear community,
I have a problem with Combat-Seq, which I used to correct batch effects in my RNA-seq data.
I have posted a picture at the bottom, which shows you some PCA plots I generated. I have some miRNA RNA-seq data which shows clear batch separation mostly along PC2, but to some extent along PC1 as well (top left plot).
I used Combat-Seq to try and correct these batch effects, but as you can see from the plots it also introduces new clustering (bottom left). I don't understand why this is happening and would be grateful for some experiences, tips or help
The command I used to do the batch correction is this (here Combat-Seq is wrapped in the BatchQC R package):
se_object <- BatchQC::batch_correct(se = se_object, method = "ComBat-Seq",
assay_to_normalize = "counts", batch = "batch_no",
covar = c("v1_stat", "v1_age", "v1_sex"), output_assay_name = "CombatSeq_corrected")
the covariates refer to case-control status (v1_stat
), age and sex. Normalized expression on the right was generated using median of ratios (DESeq2 default)
Any help or explanation would be greatly appreciated.
Could you add "sex" as a covariate to the plot?
Here is the same plot with sex as color. I also added a confounding table. As you can see, the batch variable is highly correlated with the clinical status (case/control). This is due to the distribution of samples across batches with many batches containing only cases or controls and only a few mixed samples.
Could this be the reason for CombatSeq not working properly? And do you have any suggestions on how to deal with this problem?
Thanks for your help!