Using parallelization can help, so be sure to set BPPARAM if using scater's runUMAP
. It still takes a while though. I agree with ATpoint in that I tend to prefer greater spread than the defaults, though I have have never gone up to min.dist
of 0.75. Here's a function I use to generate a whole bunch of them, which I then viz and pretty arbitrarily pick whichever one best balances global/local structure for my needs:
library(SingleCellExperiment)
library(scater)
library(BiocParallel)
umap_sweep <- function(sce, dim_reduc,
min_dist = c(0.01, 0.02, 0.05, 0.1, 0.2, 0.3),
n_neighbors = c(10, 15, 20, 30, 40, 50),
spread = c(0.8, 1, 1.2),
BPPARAM = BiocParallel::bpparam()
) {
for (d in min_dist) {
for (n in n_neighbors) {
for (sp in spread) {
message("Running UMAP with min_dist = ", d, ", n_neighbors = ", n, ", spread = ", sp)
sce <- runUMAP(sce, n_neighbors = n, min_dist = d, spread = sp,
name = paste0("UMAP_m.dist", d, "_n.neigh", n, "_spread", sp),
dimred = dim_reduc, ncomponents = 2, BPPARAM = BPPARAM)
}
}
}
return(sce)
}
sce <- umap_sweep(sce, dim_reduc = "PCA")
Generally, I never find those with min.dist
< 0.1 to be what I'm looking for, so feel free to adjust the defaults/inputs here to play with things more (or limit them so things run faster). n_neighbors
also has a relatively limited impact in comparison to the other parameters.
Not that I am aware of. Most of the time UMAP is just visualization. Personally, I prefer to have points scattered out rather than bunched up, and I go well with spread and min.dist of 0.75. Rest (referring to uwot::umap and scater::runUMAP in R) I leave at default.
That entirely depends on the dataset and the message you want to send as well as the narrative of the story.
Thanks ATpoint !