Hello
I am working on an RNA-seq project where I compare cancer samples infected with a virus to cancer samples that are not infected. I obtained DEGs with an FDR < 0.05 and a log2 fold change (LFC) > 2, and I also conducted a meaningful gene set enrichment analysis.
However, I am facing a problem with the batch correction step. The data was retrieved from different projects, they all are the same type of cancer. I used the SVA package and the RUVSeq package for batch correction, but I still don't see separation based on the biological conditions. The points are mixed on the PCA plot, and samples from each batch are clustering together. Additionally, some batches are close to each other on the PCA plot.
Can I rely on these DEGs and proceed with publishing them?
I don't believe in batch correction, experiment and controls have to come from the same batch... if you start wrong, you'll always be in doubt..
Can OP go back in time and change when things were sequenced? If not, this answer makes no sense. You are free to "not believe" in science, doesn't make it not true.
Personally, I don't think batch correction (ComBat, specifically) works for my use case but it might work for others.
Please give some thought to whether or not your post actually answers the question before adding it as an answer. I've moved your post to a comment now.
To what extent is batch confounded with treatment?