Remove batch effects on the train set to avoid information leakage

0

Entering edit mode

15 months ago

JACKY ▴ 160

I aim to apply Limma's removeBatchEffect function on my data, but only after splitting it into train and test sets. I'm aware that applying batch correction before this partition can introduce information leakage, so I want to avoid that. Previously, I've been batch correcting my entire dataset as follows:

cancer.type = metdata$Cancer_Type
correctedTPM = limma::removeBatchEffect(TPM, batch = cancer.type)

I'd like to adjust my approach: first correct the training set and then utilize the derived parameters from the training set to correct the test set. This is analogous to the best practices for data scaling. Is there a method in R to achieve this with removeBatchEffect or another technique?

r limma batch-effect • 681 views

ADD COMMENT • link updated 15 months ago by Ram 44k • written 15 months ago by JACKY ▴ 160

0

Entering edit mode

I've seen bad experiment design where biological variables get confounded with sequencing batches but this is the first time I'm encountering wanton disregard for biology and abuse of batch correction techniques.

ADD REPLY • link 15 months ago by Ram 44k

Login before adding your answer.