I'm unsure of how to tackle preprocessing for machine learning. Typically, a training dataset is used to fit a scaler or normalizer, and this fitted model is used to transform the training and the testing data. The reason for this is so that information from the testing set does not influence the training set and cause overestimation of the model.
I'm lost as to how this process can apply to common microarray preprocessing algorithms which rely on the information from the total arrays of an experiment, including Robust Multi-Array normalization and ComBat batch effect correction. I've tried to search for the literature for how this is handled, but I've only seen as far as "Data was preprocessed using X, Y, and batch effects were corrected using Z." followed by which algorithms were used to classify.
Are there resources or recommendations on best practices for machine learning with gene expression data that I'm missing? Thanks for any help