Can't reproduce harmony results in Seurat
2
0
Entering edit mode
3 months ago
fifty_fifty ▴ 70

Hello,

I integrated some scRNA-seq data using Harmony on Seurat V4 back in 2022, but I didn’t save the Seurat object at that time. When I needed to perform further analysis, I reran Harmony on the same data, but the clustering results were quite different. Specifically, the number of clusters decreased from 20 to 18, and most importantly, I can’t find cluster 17, which is crucial for our analysis.

enter image description here enter image description here

Is there any way to recover the clustering results from 2022? If reproducing the previous results isn't possible, I plan to increase the resolution parameter until I see the cluster of interest and annotate the new clusters. How can I justify that the new annotation closely matches the previous one?

Thank you for your help!

Seurat scRNA-seq harmony • 1.1k views
ADD COMMENT
0
Entering edit mode

Are you using the exact same computer (same processor and OS) ?

ADD REPLY
0
Entering edit mode

No, I have a diferent MacBook now. Do you think that can affect harmony embeddings?

ADD REPLY
2
Entering edit mode

I have seen something similar before. If the processor chip from your MacBooks are different, it might affect your results. Like the way the processors are handling floating (Consistency of Seurat SCTransform across computers/environments)

ADD REPLY
1
Entering edit mode

looks like my it's related to M2 processor. I ran the script on Linux and it was able to find the cluster of interest. Still, the number of clusters is 19 but I used Seurat V5, maybe if I go back to V4 I'll be able to reproduce my initial result. Thank you!

enter image description here

ADD REPLY
0
Entering edit mode
3 months ago

Hi, I have in the past been able to reproduce old results using the most recent version of Seurat (v5). Can you share your code that you were (are?) using?

ADD COMMENT
0
Entering edit mode

I am using the same code and data that I had 2 years ago

Kim_brain_data <- Kim_data[,grepl("NS_",colnames(Kim_data))]
Kim_brain <- CreateSeuratObject(Kim_brain_data, project="Kim_brain", min.cells=3, min.features = 200)
Kim_brain[["percent.mt"]]<-PercentageFeatureSet(Kim_brain, pattern="^MT-")
Kim_brain[["percent.rb"]]<-PercentageFeatureSet(Kim_brain, pattern="RP[SL]")
VlnPlot(Kim_brain, features = c("nFeature_RNA", "nCount_RNA", "percent.mt", "percent.rb"), ncol = 4, pt.size = 0.3)
Kim_brain_sub <- subset(Kim_brain, subset = nFeature_RNA < 7500 & nFeature_RNA > 500 & nCount_RNA < 50000 & percent.mt < 15 & percent.rb < 50)
Kim_annot_brain_sub <- Kim_annot_brain[rownames(Kim_annot_brain) %in% colnames(Kim_brain_sub),]


Kim_brain_sub <- AddMetaData(Kim_brain_sub, Kim_annot_brain_sub)
Kim_brain_sub <- Seurat::NormalizeData(Kim_brain_sub, verbose = FALSE) %>%
  FindVariableFeatures(selection.method = "vst", nfeatures = 2000) %>% 
  ScaleData(verbose = FALSE) %>% 
  RunPCA(pc.genes = Kim_brain_sub@var.genes, npcs = 20, verbose = FALSE)

Kim_brain_sub <- Kim_brain_sub %>% 
  RunHarmony("Sample", plot_convergence = TRUE)
harmony_embeddings <- Embeddings(Kim_brain_sub, 'harmony')

Kim_brain_sub <- Kim_brain_sub %>% 
  RunUMAP(reduction = "harmony", dims = 1:20) %>% 
  FindNeighbors(reduction = "harmony", dims = 1:20) %>% 
  FindClusters(resolution = 0.5) %>% 
  identity()
DimPlot(object = Kim_brain_sub, reduction = "umap", pt.size = .1, label = T)
ADD REPLY
0
Entering edit mode

When you run UMAP these days you get this:

Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
This message will be shown once per session

This is probably why your resulting UMAP looks different as well as the already mentioned difference in how the runs are seeded

ADD REPLY
0
Entering edit mode

thank you, I didn't notice that warning.

So, I ran it on Seurat 4.2.0 and got umap viz that doesn't look like the initial clusters either. Again, it found 18 clusters instead of 20.

enter image description here

here, I set.seed(1) in the very beginning before CreateSeuratObject. the visualization looks different but the number and structure of clusters did not change.

enter image description here

ADD REPLY
0
Entering edit mode

Can I just double check. You're basically saying you're getting a different number of clusters. But the general stucture looks similar. Do you know what the markers of cluster 17 were? Can you check for presense absense based on gene expression rather than visually looking for it?

ADD REPLY
0
Entering edit mode

The markers are expressed in two cell populations. One can be found with no problem but the second one is smaller. In my initial run from 2022, it was detected as a separate cluster 17

enter image description here

I still see those two cell populations in the new clustering visualization but the smaller cluster can't be detected as a separate cluster. It is a part of cluster 3 here

enter image description here

I technically can increase resolution until I see it but I was hoping that I would be able to reproduce my 2022 results. We've already published the 2022 results and need cell-cell interaction analysis with the same data.

ADD REPLY
0
Entering edit mode
3 months ago

Have you tried iterating over the seed with set.seed()? I doubt there is a way to directly infer the seed but you could probably identify your UMAP in a lineup.

Set.seed in Seurat

ADD COMMENT
1
Entering edit mode

IIRC the older versions of Seurat use the seed 42. But because methodology has changed between versions, especially how multiple samples are stored and normalised by default, it might not just be a seed issue. For example, Harmony as ran in a Seurat wrapper, and Harmony ran from the package have different input parameters.

ADD REPLY
0
Entering edit mode

I tried to change seed but it didn't affect the number or structure of clusters, it changed umap visualization. I am using the same script, same resolution parameter and getting less clusters. That's why I thought that maybe harmony creates different embeddings

ADD REPLY

Login before adding your answer.

Traffic: 2956 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6