I need to integrate snRNA-seq data from 3 samples (S11, S12, S13) of species1 and 3 samples (S21, S22, S23) of species2. So, I have potential batch effects of library prep among samples of the same species and bio and tech batch effects between different species. Also, when I use harmony for integration, I need to specify group.by.vars which can be 'samples' or 'species'. Which option will be correct:
- Run two separate RunHarmony for S21, S22, S23 and S21, S22, S23 and then integrate two resulting objects?
- Integrate all samples at the same time. Can I put
group.by.vars = c('sample','species')
then?
What will be the best way to integrate all samples and remove batch effects associated with sample handling and inter-species heterogeniety?