Hey ya'll,
This question is a follow-up question after the solution from here: A: How to use FindSubCluster in Seurat?
After subclustering using FindSubCluster
, how do I FindAllMarkers
using the additional cluster assignments on the whole Seurat
Object? The cluster I subcluster is skipped over during FindAllMarkers
for some reason? Any help would be appreciated. I think FindSubCluster
is new to Seurat
v4.0.
Any help would be appreciated.
> scfp <- FindNeighbors(scfp, graph.name = "test", dims = 1:100)
Computing nearest neighbor graph
Computing SNN
> scfp <- FindClusters(scfp, graph.name = "test", resolution = 2, algorithm = 1, verbose = TRUE)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 1836
Number of edges: 16978
Running Louvain algorithm...
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.7368
Number of communities: 24
Elapsed time: 0 seconds
4 singletons identified. 20 final clusters.
> #scfp <- RunUMAP(scfp, dims = 1:100)
> scfp <- RunTSNE(scfp, dims = 1:100)
> #DimPlot(scfp, reduction = "umap", label = TRUE)
> DimPlot(scfp, reduction = "tsne", label = TRUE, label.size = 6 )
> scfp <- FindSubCluster(scfp, "6", "test", subcluster.name = "blood", resolution = .3, algorithm = 1)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 104
Number of edges: 819
Running Louvain algorithm...
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.7301
Number of communities: 3
Elapsed time: 0 seconds
> DimPlot(scfp, reduction = "tsne", group.by = "blood", label = TRUE, label.size = 6)
> scfp.markers <- FindAllMarkers(scfp, graph.name = "test", group.by = "blood", only.pos = TRUE, min.pct = 0.1, logfc.threshold = 0.25)
Calculating cluster 0
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=08s
Calculating cluster 1
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=05s
Calculating cluster 2
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=07s
Calculating cluster 3
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=18s
Calculating cluster 4
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=16s
Calculating cluster 5
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=07s
Calculating cluster 6
Calculating cluster 7
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=13s
Calculating cluster 8
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=17s
Calculating cluster 9
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=17s
Calculating cluster 10
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=24s
Calculating cluster 11
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=13s
Calculating cluster 12
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=03s
Calculating cluster 13
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=18s
Calculating cluster 14
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=12s
Calculating cluster 15
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=13s
Calculating cluster 16
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=13s
Calculating cluster 17
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=14s
Calculating cluster 18
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=17s
Calculating cluster 19
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=14s
> View(scfp.markers)
I don't see it in your code, but you should also set the default assay to RNA, as the DE analysis is done on the raw counts.
DefaultAssay(scfp) <- "RNA"
. Or if you don't want to change the assay globally,FindAllMarkers(.., assay = "RNA
,..)Thank you for responding @fracarb8.
Is this a standard in the field to do DE analysis on raw counts? I saw this mentioned on the biostars slack channel as well, where someone suggested using counts for DESeq2 analysis, not sure if it's the same. Could you share, why, if you know? In the Seurat tutorial, it shows doing the analysis on the log normalized and scaled data. However, it does touch on how to use "counts" in the analysis, if one desires. I would appreciate your insight.
Kindly, Pratik
You use counts mainly to assure that data you are using is independent between each sample. Packages like DESeq2 implement their own way of normalisation, so you don't want to feed them normalise/transformed counts. On top of that, your normalisation depends on the condition you are testing. Imagine if you have 20 samples, but only interested in the differences between sample4 (id1) and sample18 (id2). You don't care about the effects of the other samples, so a "global" normalisation can be misleading. Regarding
seurat
things are a bit confusing (at least to me :D), but the general consensus would be to 1) integrate, 2)SetDefaultAssay() <- "RNA"
,3) normalise your counts if you used SCT, and then 4) perform DE analysis. You should have a look at theissue
section of the official git page, as it is filled with such questions.