I know that batch effects are variations in the data that are not biological, but from outside factors, like who took the samples, when was it taken (AM or PM), and so on.
My situation is the following: I'm gathering RNA-seq transcriptomic data about people who were treated with a certain cancer drug, but before they were treated. So I have the labeling if they responded or not, and I have their RNA-seq data before they were treated. After gathering the data, I conduct a cell type enrichement analysis for each dataset individually, to see if the people who responde and people who do not responde have different tumor cellular profile. The data is from different cancer types, different datasets, it's heterogenous. I have datasets of melanoma, lung cancer, renal cell carcinoma, gastro, and more cancer types.
After doing the cell type enrichement analysis for each dataset separatly, I combine all the scores into one comprehensive dataframe, called scores
, and do batch effect correction with this code:
scores.batch = limma::removeBatchEffect(scores, batch = scores$cancertype)
what does this do exactly, and do you think I should also correct for data set?