We want to find any differences between groups with different treatments:
Control: 2 samples Treatment 1: 2 samples Treatment 2: 2 samples
Each sample is collected from a different mouse. The first step of our analysis is clustering to find cell identities in samples. Those samples were sequenced with Hiseq. My question is: before clustering, should I pool 6 samples from 3 different conditions to remove batch effect? i.e. using harmony or whatever. Since I am think about if we only pool samples within the same study group and perform the analysis three times (for control, treatment 1 and treatment 2), will there be any significant difference on the results of clustering?
Thanks! Yes, my current workflow uses harmony. I pooled samples and integrated samples according the sample ID, ignoring the treatment of each sample. Thus in this way I got a unified clustering landscape across different treatments. I was asked by our biologists since they might hypothesize there should be different landscapes in different conditions. Maybe we should use the significant differences of cell identity proportions to elaborate "a different landscape"?
You can definitely perform differential abundance testing. Bioconductor details a method here using the negative binomial distribution. You can also answer the question with a simple monte-carlo simulation.