Hi, I am currently doing a meta-analysis study for Gliomas (Microarray gene expression data) and have followed the steps in the following article to select different data sets that I am using for meta-analysis as well as the steps for processing each data set. https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0050184
Based on my criteria of selection of data sets, I have 8 data sets from GEO and TCGA which have finally made the cut to be included in the study. Most of them are from Affymetrix (different versions) but there are some from Illumina and Agilent as well. Since, they are from different platforms I have done the quality control and processing of the raw data independently to prepare the data sets for meta-analysis. However, I have a question regarding the processing of the a data set which consists of gene expression data collected from different institutes as it will have batch effect.
Do I need to remove the batch effect in this data set individually? Or is it okay if I will take care of the batch-effect anyways when I combine all the data sets together for meta-analysis as they all come from different platforms and studies.
Thanks! Nash
P.S. I have gone through some of the answers related to batch-effect on this platform and the following article: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0017238 They were helpful and now I am aware of the existing methods to deal with it but am not sure about whether to do it individually or not in this case.