Conditions for valid batch effect correction
0
1
Entering edit mode
5.7 years ago

Using the PVCA package few batch effects appear in the data. When correcting for batch effects with LIMMAs removeBatchEffect it is important that batches are distributed equally. If PVCA shows several batch factors, must I test distribution of all together or only for the factors in isolation?

E.g., I see that date.processed as well as cell number have an influence.

Checking for the distribution I could either just look at both factors individually like:

table(batch_effectors_sample$clinical_phenotype_general, batch_effectors_sample$date_processed_y_m_d)

and

table(batch_effectors_sample$clinical_phenotype_general, batch_effectors_sample$cell_number_clean)

In both cases samples are well balanced

But if I look at both factors in conjunction with

table(batch_effectors_sample$clinical_phenotype_general, batch_effectors_sample$date_processed_y_m_d, batch_effectors_sample$cell_number_clean)

samples are rather unbalanced distributed (e.g., with 2 dates containing only patients).

If only the distribution among one factor matters I could correct for both, if both matter I would have to choose which factory to correct for and retain the other. Which way is more correct?

Thanks for your help!

Sebastian

r batch-effect • 735 views
ADD COMMENT

Login before adding your answer.

Traffic: 2595 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6