Post does not exist.
How to use harmony with scanpy?
2
0
Entering edit mode
4 weeks ago
bioinfo ▴ 150

Hello,

I have crossposted in https://discourse.scverse.org/t/help-with-harmony-and-scanpy/3477 but I have not gotten a reply so I hope it is ok to post here too.

I am interested in using harmony to integrate some samples but I am not sure if I am doing it correctly. I have 6 samples with 2 conditions (3 control/3 treated)

I do the following steps:

  1. Merge samples
  2. QC
  3. Normalization+ Log
  4. Regress out effects of total counts per cell
  5. Scale the data to unit variance
  6. PCA
  7. UMAP
  8. Harmony using sample as a batch key (sce.pp.harmony_integrate(adata, 'sample'))
  9. UMAP
  10. Leiden Clustering
  11. Identify marker genes

Then I use that information to assign cell types and then do DE analysis for each cluster between the control and treatment.

There are a few steps I am concerned. Does harmony use the log normalized or the regressed data? In a lot of scanpy commands they use as default the adata.raw with the log normalized data. Also, it does not seem like harmony changes the adata.X or adata.raw.X which would mean that I can still do DE using the object i get after harmony right?

Thank you

scanpy harmony • 748 views
ADD COMMENT
1
Entering edit mode
4 weeks ago
LChart 4.9k

Harmony performs batch effect correction on an embedding space, .obsm['X_pca'] by default, but you can change this by passing basis=. I also would recommend performing batch correction on PCs rather than on the UMAP embeddings. The neighbor calculation uses 'X_pca' (you will pass use_rep='X_pca_harmony' for harmony), so if you don't correct that embedding, Harmony won't have any impact on the clustering, and therefore any DE based on your leiden clusters will not change.

ADD COMMENT
0
Entering edit mode

Thank you for your reply. After running Harmony I do the below:

adata.obsm['X_pca'] = adata.obsm['X_pca_harmony']
sc.pp.neighbors(adata, random_state=0)
sc.tl.umap(adata)

I usually normalize and log my data and then I do regression and scaling. In the link below it says that they use log normalized and scaled values: https://portals.broadinstitute.org/harmony/articles/quickstart.html#installation-1

Does that mean that I should not regress the data before scaling?

ADD REPLY
0
Entering edit mode
18 days ago
Rezin • 0

You can explicitly use X_pca_harmony as below:

sc.tl.umap(adata, init_pos='X_pca_harmony')
sc.pl.umap(adata, ['Your_group'])

Anyway this worked for me.

ADD COMMENT

Login before adding your answer.

Traffic: 1535 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6