Hi
I am trying to detect batch effect in my microarray samples each belongs to multiple different groups including one of five batches. I am currently melting my expression data frame (log2 normalised values) so that I have a list of sampleIDs in column1, and a list of expression values in column2. I am then adding information such as serology or batch in extra columns. I then use a anova test to discern what magnitude the batch effect is having relative to other variables.
aov.ex2 = aov(value~CELL.TYPE+VISIT+SEROLOGY+HYB.BATCH,data=merged)
Df Sum Sq Mean Sq F value Pr(>F)
CELL.TYPE 4 2221 555.3 109.61 <2e-16 ***
VISIT 4 552 138.0 27.25 <2e-16 ***
SEROLOGY 1 347 347.3 68.55 <2e-16 ***
BATCH 4 2123 530.9 104.79 <2e-16 ***
Residuals 5391730 27314376 5.1
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Df Sum Sq Mean Sq F value Pr(>F)
CELL.TYPE 4 1318 329.6 65.683 <2e-16 ***
VISIT 4 410 102.6 20.440 <2e-16 ***
SEROLOGY 1 467 466.7 93.004 <2e-16 ***
BATCH 4 3 0.6 0.128 0.972
Residuals 5391730 27058055 5.0
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
I can see that the p value for batch effect has gone up a lot. However I was a bit confused because the PCA plot did not show any batch effect, yet the anova test is giving me a highly significant value for batch effect. Also the F value for the batch effect is very high, higher than other clinical variables, I would not really expect this. Any comments or thoughts? Am I doing this correctly?
Cheers,
Robert
What are the two different analyses? BATCH is not significant in the second one.
Please post some of the data so we see the structure.