Why my single cell cluster is splitted?
1
2
Entering edit mode
21 months ago
paria ▴ 90

Hi all,

I am analyzing my single nuclei dataset. I did clustering and cluster annotation. And I observed sth that I couldn't explain why it is happening. Below is the umap projection of my dataset. As you see the MN_1 cluster is (green one) is clustered separately. But some of cells are closer to oligodendrocyte clusters (the blue one) than MN cluster. I don't know first, why the MN_1 is one cluster but splitted into two cluster second, why some of its cell seems to be part of Olig clusters

enter image description here

Any help is really appreciated!

Thanks,
Paria

clustering single-cell • 1.6k views
ADD COMMENT
0
Entering edit mode
21 months ago
Mensur Dlakic ★ 28k

A couple of things. Referring to clusters as green and blue, when there are at least three shades of each color, is not very informative. Of the three green colors in the image, the one you refer to as green I'd call the least green.

Assuming you didn't make a mistake in connecting labels to points, one has to remember that cells behave somewhat stochastically. It is normal that a small subpopulation may not behave exactly as expected. How certain are you about cell designations?

ADD COMMENT
0
Entering edit mode

Another thing: UMAP is not a definitive truth, it just UMAPs best attempt at embedding higher dimensional data in 2d space. Just as UMAP splits up your MN_1 and MN_2 cells, it embeds all OLG_1-5 cells into one large blob and you can ask the same inverse question: why does UMAP not split these cells?

To find the answer you have to dig into the inner workings of UMAP and the details of this particular dataset, I have yet to meet someone who can really explain to me how UMAP works.

ADD REPLY
0
Entering edit mode

Another thing: UMAP is not a definitive truth, it just UMAPs best attempt at embedding higher dimensional data in 2d space.

This can be stated differently. UMAP and all dimensionality reduction methods work with numbers, and create an embedding based on those numeric vectors. They don't know or care about biology, or any other field the numbers may come from. I think it is more likely that the inconsistency comes from the data - or the process that produced them - than from UMAP.

I have yet to meet someone who can really explain to me how UMAP works.

It is not that difficult to understand how it works, or why sometimes it doesn't work the way we expect. In short: it is not UMAP's fault that reducing data dimensions sometimes yields interpretable clusters, and other times not so much. That is likely due to the stochastic nature of biological systems, or errors in data collection. Also, let's not forget that not all datasets are readily interpretable when reduced to two dimensions.

ADD REPLY

Login before adding your answer.

Traffic: 1517 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6