Question

umap failed to cluster the cells

0

Entering edit mode

23 months ago

Dan ▴ 180

Hello

I tried umap visualization with scanpy :

sc.pp.scale(adata, zero_center=True, max_value=None, copy=False, layer=None, obsm=None)
sc.pp.pca(adata, n_comps=50, use_highly_variable=True, svd_solver='arpack')
sc.pp.neighbors(adata, n_neighbors=50)

sc.tl.umap(adata, min_dist=0.5, spread=1.0)
sc.pl.umap(adata, color='fullname', use_raw=False, save='samples_umap.pdf')

But the cells can't separate well

I tried another small dataset with scanpy using the same parameters as before: enter image description here sc.tl.umap still failed to down dimension the data properly.

Then I tried the original umap package using the same data set:

import umap
import umap.plot
mapper = umap.UMAP().fit(adata.X)
umap.plot.points(mapper)

Now the original umap package can do down dimension very well: enter image description here

I think there may be something wrong with the umap function in scanpy

Can anyone please let me know the reason? Thanks a lot.

single-cell umap • 3.3k views

ADD COMMENT • link 16 months ago by Dan ▴ 180

1

Entering edit mode

16 months ago

schedulemerchant ▴ 10

I ran into the same problem and could even replicate it using the scanpy tutorial: https://scanpy-tutorials.readthedocs.io/en/latest/pbmc3k.html

It seems to be a compatible issue with umap and scanpy. Setting maxiter to 500 solved the issue for me:

sc.tl.umap(adata,maxiter=500)

See this issue in scanpy: https://github.com/scverse/scanpy/issues/2337

ADD COMMENT • link 16 months ago by schedulemerchant ▴ 10

0

Entering edit mode

Thanks for letting me know. I upgraded scanpy and solved this problem.

ADD REPLY • link 16 months ago by Dan ▴ 180

score 4 · Accepted Answer · 2022-12-22

4

Entering edit mode

23 months ago

Mensur Dlakic ★ 28k

UMAP is a dimensionality reduction technique, not a clustering method. It just so happens that sometimes after dimensionality reduction data points form clusters.

If data points don't separate into clearly visible clusters, that can be because they are all similar. From a faded image you provided, it seems like you have a set of outliers that spread quite a bit in UMAP1 space (X-axis), which then artificially shrinks the separation space between the remaining points. If you can remove that group, it is possible that the remaining data points will separate better.

Please don't be offended, but it is always possible that you did something wrong here. It is impossible to tell how well you did it based on presently available information.

PS You can experiment with a different number of neighbors and see how that affects the plot. I suspect >50 neighbors may produce more defined point clusters.

PPS Varying a number of PCA components may help as well. Or better yet, try UMAP on raw data without PCA pre-processing. UMAP is fast enough that it could work.

ADD COMMENT • link 23 months ago by Mensur Dlakic ★ 28k

0

Entering edit mode

Hi Mensur:

I compare the umap in scanpy with the original umap (https://umap-learn.readthedocs.io/en/latest/plotting.html) using the same dataset, the original umap works well, I think the problem is in scanpy. I edited my question to include this. Do you have some suggestions? Thanks

Dan

ADD REPLY • link 23 months ago by Dan ▴ 180

1

Entering edit mode

Not sure why you need me to point this out because it seems obvious that scanpy is calling UMAP with different parameters.

sc.pp.neighbors(adata, n_neighbors=50)
sc.tl.umap(adata, min_dist=0.5, spread=1.0)

On the other hand, your plot using UMAP directly shows n_neighbors=15, min_dist=0.1, so there is your difference.

ADD REPLY • link 23 months ago by Mensur Dlakic ★ 28k