Hi All,
I'm looking at an old microarray gene expression dataset and I had a question about correcting for chip to chip batch effects. The samples were run on the Affymetrix Mouse Genome 430 2.0 Array.
The experimental design is the following: 3 biological replicates of both paired mouse cutaneous skin (c) and oral mucosa (m) taken from 8 different time points (including t0= control) for a total of 48 samples. The issue I'm concerned about is that when looking at the chip hybridization data, all of the samples for both cutaneous and mucosa at the same time points were hybridized to the same chip. I'm worried I cannot correct for batch effects due to this and since I'm looking for differential expression changes over time. Please see below:
Chip 1: t0_c1, t0_c2, t0_c3, t1_c1, t1_c2, t1_c3, t0_m1, t0_m2, t0_m3, t1_m1, t1_m2, t1_m3
Chip 2: t2_c1, t2_c2, t2_c3, t3_c1, t3_c2, t3_c3, t2_m1, t2_m2, t2_m3, t3_m1, t3_m2, t3_m3
Chip 3: t4_c1, t4_c2, t4_c3, t5_c1, t5_c2, t5_c3, t4_m1, t4_m2, t4_m3, t5_m1, t5_m2, t5_m3
Chip 4: t6_c1, t6_c2, t6_c3, t7_c1, t7_c2, t7_c3, t6_m1, t6_m2, t6_m3, t7_m1, t7_m2, t7_m3
When looking at this normalized data on a PCA plot the samples group according to the chip they were run on t0_c with t1_c and t0_m with t1_m, etc.. Is there any way I can correct for this batch effect with the way they were run on the chips?
Hi mforde84,
Thanks for your response. I'm afraid I might not have been clear in my question. I was planning to use ComBat to correct for chip to chip batch effects, but based on the sample layout across chips (see original post) and my experimental question I'm looking to answer I'm worried I won't be able to correct for the batch effects without altering the analysis.
My experimental question is to see what differential expression changes occur in each tissue across different time points. My variable of interest is time.
The issue I have is that all of the samples for each different time point are on one chip so I don't think I can correct for chip to chip variation due to this (t0, t1 on chip 1, t2, t3 on chip 2, etc.). Does that make more sense?