Question

Data processing order: normalisation or batch effect correction first?

0

Entering edit mode

4.1 years ago

Sebastian Hesse ▴ 350

There are several questions about this topic but none of them received a clear answer, so here is my try to find a solution:

I am working with a proteome dataset that clearly has a batch effect from the date the MS measurement was done. The dataset is also not normalised yet. Now, which is the correct order to press the data?

1) Normalise the data first and remove the batch effect then (I use limmas "removebatcheffect")

OR

2) Remove batch effect from the data first and normalise then?

Please give straight up answers. If you feel the need to write: "well, depends on what you want to do...", please explain what you actually mean by that. I do not plan to normalise parts of the data in different ways, just straight up: normalise them all together and batch effect correct them all together (for date_processed only). Batch effect correction will only be used for visualisations, clustering and ML classifications, not for diff ex (there it will be included as a covariate).

Thanks a lot for your advice!

Sebastian

(I will try all, do more research and post results here to hopefully finally resolve this issue).

limma proteome batch-effect R • 2.2k views

ADD COMMENT • link updated 7 months ago by Ram 44k • written 4.1 years ago by Sebastian Hesse ▴ 350

1

Entering edit mode

I always normalize first as a general approach, because normalization usually seeks to correct intrinsic variations in your samples due to the technology, and is developed thinking on those issues, not on what effects a batch can cause, which may arise from many, independent causes. It would help to know if the normalization algorithm you will apply is within-sample or between-samples? Especially if it is within-sample, I would definitely normalize first because there are less chances of the batch influencing the other samples' normalization transformation. If it is between-samples, I would say it can depend on the normalization procedure at hand, and on the effect that the batch is having on the data. Some strong normalizations such as quantile normalization may even help remove the batch effect.

ADD REPLY • link 4.1 years ago by Papyrus ★ 3.0k

score 0 · Answer 1 · 2020-11-05

Thanks a lot for your answers!

Im actually using a whole array of normalisations and then then which one stands out as best (eg in numbers of diffEx and clustering algorithms). Im using: - clr - cyclic loess - VSN - cubic splines - quantile normalisation - median normalisation

So far VSN always gave the best results. I think it is a between-samples normalisation.

The explanation of Papyrus makes sense to me, I will normalise first and maybe one of the methods will actually remove (at least part of) the batch effect. The rest (unfortunately the effect is rather large) I can then smooth out with Limma.

Thanks a lot for your advice, it helped me to think about the problem from a different angle! Sebastian