Hello, biostars.
I have raw CEL files of two microarray datasets that I read them in R by the ReadAffy function and finally, I want to remove the batch effect between them. I want to know if is it correct to merge these two datasets at first and then perform quality control , background correction, and normalization and then perform batch effect removal? Or I should perform quality control , background correction, and normalization separately for each dataset at first then merge these datasets and remove the batch effect?
Thanks for your answer. So what is the reason for that? For example, in background correction, the background intensities for all probs are expected to be similar among probs of one dataset but it is different from the background intensity of the other dataset. If we merge these two datasets first and then perform background correction the amount of background that decreases from the prob intensities do not make sense. The same goes for normalization. Also about quality control by considering the methods that are used in QC packages like affyPLM (It takes the difference of log expressions on the chip to its log expression on the reference chip which is constructed as the median expression value over all chips that means if we merge two datasets that have a batch effect the reference chip may not be correctly constructed) and simpleaffy (which assumes that the trimmed mean intensity for each array should be constant and it can not be right when we merge two different datasets because of batch effect. In each dataset the chips have constant intensity but can not have the same intensity as the other dataset because of the batch effect) I think we should not merge the two datasets before quality control. What are you think about these reasons? Do they make sense?