Hello,
at first, I'd like to mention that smillar questions have been asked several times. However I decided to ask once again because I didin't find satysfying answer, and all present questions are old ~asked over 5 years ago. So could you give me advice, I don't want a ready solution. I have two Affymetrix microarray experiments U133 plus 2.0, same platform, same chemistry, same chip type, howver both experiments have been performed in different labs at different times. I'd like to merge these two experiments into one. Then I'm going to obtain more different samples (to perform differential expression analysis) and more healthy controls. What is more I'll be able to add samples from one microarray to the another and do some PCA and MDS analysis. I know that I can't take normalized results from both experiments and just combine them. I can't due to high sensitivity and susceptibility of such experiments, so certainly they differ because of batch effect, as I suppose. Summarizing based on your experience, do you know some Bioconductor packages that will allow me to such comparison and overcome the batch effect?
Thank you in advance,
Adam
What do you mean, exactly?
Also, by adding more samples later on, will that not introduce yet another batch effect?
I am not confident that batch can be accurately modeled with just n=1 per batch group.
By merging I meant all .CEL files (from two experiments) being normalized together to obtain one expression set that contains all samples. I think that such merging isn't allowed, I'd like to avoid aby bias in my final results.
Well, bias exists everywhere... every publication... every experiment... every sentence that we speak. I would encourage you to process them together to see if any evidence of batch exists. A PCA bi-plot is usually a good measure: If there is a large batch effect, then samples may be separated on PC1 with up to 80% explained variation. If the batch effect is mild, the effect may only appear on a different PC, like 2 or 3, and have less explained variation.
If you proceed to process them together, you can include '
batch
' as a covariate in all downstream models. Again, however, with just n=1 per batch, it will be difficult to accurately model that batch effect.Actually, all I was needed...
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-13-335