Using the PVCA package few batch effects appear in the data. When correcting for batch effects with LIMMAs removeBatchEffect it is important that batches are distributed equally. If PVCA shows several batch factors, must I test distribution of all together or only for the factors in isolation?
E.g., I see that date.processed as well as cell number have an influence.
Checking for the distribution I could either just look at both factors individually like:
table(batch_effectors_sample$clinical_phenotype_general, batch_effectors_sample$date_processed_y_m_d)
and
table(batch_effectors_sample$clinical_phenotype_general, batch_effectors_sample$cell_number_clean)
In both cases samples are well balanced
But if I look at both factors in conjunction with
table(batch_effectors_sample$clinical_phenotype_general, batch_effectors_sample$date_processed_y_m_d, batch_effectors_sample$cell_number_clean)
samples are rather unbalanced distributed (e.g., with 2 dates containing only patients).
If only the distribution among one factor matters I could correct for both, if both matter I would have to choose which factory to correct for and retain the other. Which way is more correct?
Thanks for your help!
Sebastian