Hello,
I have crossposted in https://discourse.scverse.org/t/help-with-harmony-and-scanpy/3477 but I have not gotten a reply so I hope it is ok to post here too.
I am interested in using harmony to integrate some samples but I am not sure if I am doing it correctly. I have 6 samples with 2 conditions (3 control/3 treated)
I do the following steps:
- Merge samples
- QC
- Normalization+ Log
- Regress out effects of total counts per cell
- Scale the data to unit variance
- PCA
- UMAP
- Harmony using sample as a batch key
(sce.pp.harmony_integrate(adata, 'sample'))
- UMAP
- Leiden Clustering
- Identify marker genes
Then I use that information to assign cell types and then do DE analysis for each cluster between the control and treatment.
There are a few steps I am concerned. Does harmony use the log normalized or the regressed data? In a lot of scanpy commands they use as default the adata.raw with the log normalized data. Also, it does not seem like harmony changes the adata.X or adata.raw.X which would mean that I can still do DE using the object i get after harmony right?
Thank you
Thank you for your reply. After running Harmony I do the below:
I usually normalize and log my data and then I do regression and scaling. In the link below it says that they use log normalized and scaled values: https://portals.broadinstitute.org/harmony/articles/quickstart.html#installation-1
Does that mean that I should not regress the data before scaling?