There are several questions about this topic but none of them received a clear answer, so here is my try to find a solution:
I am working with a proteome dataset that clearly has a batch effect from the date the MS measurement was done. The dataset is also not normalised yet. Now, which is the correct order to press the data?
1) Normalise the data first and remove the batch effect then (I use limmas "removebatcheffect")
OR
2) Remove batch effect from the data first and normalise then?
Please give straight up answers. If you feel the need to write: "well, depends on what you want to do...", please explain what you actually mean by that. I do not plan to normalise parts of the data in different ways, just straight up: normalise them all together and batch effect correct them all together (for date_processed only). Batch effect correction will only be used for visualisations, clustering and ML classifications, not for diff ex (there it will be included as a covariate).
Thanks a lot for your advice!
Sebastian
(I will try all, do more research and post results here to hopefully finally resolve this issue).
I always normalize first as a general approach, because normalization usually seeks to correct intrinsic variations in your samples due to the technology, and is developed thinking on those issues, not on what effects a batch can cause, which may arise from many, independent causes. It would help to know if the normalization algorithm you will apply is within-sample or between-samples? Especially if it is within-sample, I would definitely normalize first because there are less chances of the batch influencing the other samples' normalization transformation. If it is between-samples, I would say it can depend on the normalization procedure at hand, and on the effect that the batch is having on the data. Some strong normalizations such as quantile normalization may even help remove the batch effect.