Question

How to use harmony with scanpy?

0

Entering edit mode

5 months ago

bioinfo ▴ 160

Hello,

I have crossposted in https://discourse.scverse.org/t/help-with-harmony-and-scanpy/3477 but I have not gotten a reply so I hope it is ok to post here too.

I am interested in using harmony to integrate some samples but I am not sure if I am doing it correctly. I have 6 samples with 2 conditions (3 control/3 treated)

I do the following steps:

Merge samples
QC
Normalization+ Log
Regress out effects of total counts per cell
Scale the data to unit variance
PCA
UMAP
Harmony using sample as a batch key (sce.pp.harmony_integrate(adata, 'sample'))
UMAP
Leiden Clustering
Identify marker genes

Then I use that information to assign cell types and then do DE analysis for each cluster between the control and treatment.

There are a few steps I am concerned. Does harmony use the log normalized or the regressed data? In a lot of scanpy commands they use as default the adata.raw with the log normalized data. Also, it does not seem like harmony changes the adata.X or adata.raw.X which would mean that I can still do DE using the object i get after harmony right?

Thank you

scanpy harmony • 2.6k views

ADD COMMENT • link updated 5 months ago by Rezin • 0 • written 5 months ago by bioinfo ▴ 160

score 1 · Answer 1 · 2025-02-11

1

Entering edit mode

5 months ago

LChart 5.0k

Harmony performs batch effect correction on an embedding space, .obsm['X_pca'] by default, but you can change this by passing basis=. I also would recommend performing batch correction on PCs rather than on the UMAP embeddings. The neighbor calculation uses 'X_pca' (you will pass use_rep='X_pca_harmony' for harmony), so if you don't correct that embedding, Harmony won't have any impact on the clustering, and therefore any DE based on your leiden clusters will not change.

ADD COMMENT • link 5 months ago by LChart 5.0k

0

Entering edit mode

Thank you for your reply. After running Harmony I do the below:

adata.obsm['X_pca'] = adata.obsm['X_pca_harmony']
sc.pp.neighbors(adata, random_state=0)
sc.tl.umap(adata)

I usually normalize and log my data and then I do regression and scaling. In the link below it says that they use log normalized and scaled values: https://portals.broadinstitute.org/harmony/articles/quickstart.html#installation-1

Does that mean that I should not regress the data before scaling?

ADD REPLY • link 5 months ago by bioinfo ▴ 160

score 0 · Answer 2 · 2025-02-25

0

Entering edit mode

5 months ago

Rezin • 0

You can explicitly use X_pca_harmony as below:

sc.tl.umap(adata, init_pos='X_pca_harmony')
sc.pl.umap(adata, ['Your_group'])

Anyway this worked for me.

ADD COMMENT • link 5 months ago by Rezin • 0