I analysed a 10x dataset by Seurat pkg, when I used the TSNEPlot function to plot the TSNE plot of clustering result, I found the number of cluster always different. How can I control the cluster number? which function or parameters I can use to limit the cluster number.
The FindClusters function implements the procedure, and contains a
resolution parameter that sets the ‘granularity’ of the downstream
clustering, with increased values leading to a greater number of
clusters. We find that setting this parameter between 0.6-1.2
typically returns good results for single cell datasets of around 3K
cells. Optimal resolution often increases for larger datasets. The
clusters are saved in the object@ident slot.
If you want to get a certain number of clusters without resorting to trying different settings for the resolution parameter,
you can first over-cluster and then cluster the cluster centers hierarchically and cut the resulting tree at the desired number of clusters like so:
so <- Seurat::FindClusters(so) # default resolution should lead multiple clusters (if not your data might not have any structure)
X <- Seurat::AggregateExpression(so, assays=SeuratObject::DefaultAssay(so), slot= "scale.data", group.by = "seurat_clusters")[[1]] # get average scaled expression for each variable gene and cluster
dist1 <- dist(t(X))
hclust1 <- hclust(dist1)
plot(hclust1)
clust2 <- cutree(hclust1, k = 2) # assign each cluster two to super clusters
so$merged_seurat_clusters <- data.frame(merged_seurat_clusters = t(t(clust2)) )[so$seurat_clusters,] # join super cluster assignemnt pack to original seurat object
This approach will be quite robust towards upstream changes.
There is now also Seurat::BuildClusterTree that makes this easier.