Entering edit mode
15 months ago
JACKY
▴
160
I aim to apply Limma's removeBatchEffect
function on my data, but only after splitting it into train and test sets. I'm aware that applying batch correction before this partition can introduce information leakage, so I want to avoid that. Previously, I've been batch correcting my entire dataset as follows:
cancer.type = metdata$Cancer_Type
correctedTPM = limma::removeBatchEffect(TPM, batch = cancer.type)
I'd like to adjust my approach: first correct the training set and then utilize the derived parameters from the training set to correct the test set. This is analogous to the best practices for data scaling. Is there a method in R to achieve this with removeBatchEffect
or another technique?
I've seen bad experiment design where biological variables get confounded with sequencing batches but this is the first time I'm encountering wanton disregard for biology and abuse of batch correction techniques.