umap failed to cluster the cells
2
0
Entering edit mode
23 months ago
Dan ▴ 180

Hello

I tried umap visualization with scanpy :

sc.pp.scale(adata, zero_center=True, max_value=None, copy=False, layer=None, obsm=None)
sc.pp.pca(adata, n_comps=50, use_highly_variable=True, svd_solver='arpack')
sc.pp.neighbors(adata, n_neighbors=50)

sc.tl.umap(adata, min_dist=0.5, spread=1.0)
sc.pl.umap(adata, color='fullname', use_raw=False, save='samples_umap.pdf')

But the cells can't separate well image

I tried another small dataset with scanpy using the same parameters as before: enter image description here sc.tl.umap still failed to down dimension the data properly.

Then I tried the original umap package using the same data set:

import umap
import umap.plot
mapper = umap.UMAP().fit(adata.X)
umap.plot.points(mapper)

Now the original umap package can do down dimension very well: enter image description here

I think there may be something wrong with the umap function in scanpy

Can anyone please let me know the reason? Thanks a lot.

single-cell umap • 3.3k views
ADD COMMENT
4
Entering edit mode
23 months ago
Mensur Dlakic ★ 28k

UMAP is a dimensionality reduction technique, not a clustering method. It just so happens that sometimes after dimensionality reduction data points form clusters.

If data points don't separate into clearly visible clusters, that can be because they are all similar. From a faded image you provided, it seems like you have a set of outliers that spread quite a bit in UMAP1 space (X-axis), which then artificially shrinks the separation space between the remaining points. If you can remove that group, it is possible that the remaining data points will separate better.

Please don't be offended, but it is always possible that you did something wrong here. It is impossible to tell how well you did it based on presently available information.

PS You can experiment with a different number of neighbors and see how that affects the plot. I suspect >50 neighbors may produce more defined point clusters.

PPS Varying a number of PCA components may help as well. Or better yet, try UMAP on raw data without PCA pre-processing. UMAP is fast enough that it could work.

ADD COMMENT
0
Entering edit mode

Hi Mensur:

I compare the umap in scanpy with the original umap (https://umap-learn.readthedocs.io/en/latest/plotting.html) using the same dataset, the original umap works well, I think the problem is in scanpy. I edited my question to include this. Do you have some suggestions? Thanks

Dan

ADD REPLY
1
Entering edit mode

Not sure why you need me to point this out because it seems obvious that scanpy is calling UMAP with different parameters.

sc.pp.neighbors(adata, n_neighbors=50)
sc.tl.umap(adata, min_dist=0.5, spread=1.0)

On the other hand, your plot using UMAP directly shows n_neighbors=15, min_dist=0.1, so there is your difference.

ADD REPLY
1
Entering edit mode
16 months ago

I ran into the same problem and could even replicate it using the scanpy tutorial: https://scanpy-tutorials.readthedocs.io/en/latest/pbmc3k.html

It seems to be a compatible issue with umap and scanpy. Setting maxiter to 500 solved the issue for me:

sc.tl.umap(adata,maxiter=500)

See this issue in scanpy: https://github.com/scverse/scanpy/issues/2337

ADD COMMENT
0
Entering edit mode

Thanks for letting me know. I upgraded scanpy and solved this problem.

ADD REPLY

Login before adding your answer.

Traffic: 2437 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6