I have publicly available data that was collected in two batches. The data is a superseries combined with two subseries from the same study that analyzes the same biological questions (ALS vs control).
I'm aware that normalization and transformation is a must, but should I:
-Normalize and transform each batch separately, combine, and batch effect correct using ComBat
OR
-Combine the batches, normalize and transform, and then batch effect correct?
Thanks for any help
EDIT:
I've done some digging, and it looks like global quantile normalization can reduce meaningful biological variation across groups (in the above case, between ALS and control).
As a result, there are papers (open access links below) that recommend within-class quantile normalization to normalize data while still maintaining meaningful biological differences. The first paper linked shows that these methods can also reduce batch effects for batches derived from the same experimental project (combining data for meta-analysis is a different beast, and more complicated).
Hope this information helps!
- How to do quantile normalization correctly for gene expression data analyses - Nature Scientific Reports
- Smooth quantile normalization - Biostatistics
- quantro: a data-driven approach to guide the choice of an appropriate normalization method - Genome Biology