Hi all,
I am analyzing my single nuclei dataset. I did clustering and cluster annotation. And I observed sth that I couldn't explain why it is happening. Below is the umap projection of my dataset. As you see the MN_1 cluster is (green one) is clustered separately. But some of cells are closer to oligodendrocyte clusters (the blue one) than MN cluster. I don't know first, why the MN_1 is one cluster but splitted into two cluster second, why some of its cell seems to be part of Olig clusters
Any help is really appreciated!
Thanks,
Paria
Another thing: UMAP is not a definitive truth, it just UMAPs best attempt at embedding higher dimensional data in 2d space. Just as UMAP splits up your MN_1 and MN_2 cells, it embeds all OLG_1-5 cells into one large blob and you can ask the same inverse question: why does UMAP not split these cells?
To find the answer you have to dig into the inner workings of UMAP and the details of this particular dataset, I have yet to meet someone who can really explain to me how UMAP works.
This can be stated differently. UMAP and all dimensionality reduction methods work with numbers, and create an embedding based on those numeric vectors. They don't know or care about biology, or any other field the numbers may come from. I think it is more likely that the inconsistency comes from the data - or the process that produced them - than from UMAP.
It is not that difficult to understand how it works, or why sometimes it doesn't work the way we expect. In short: it is not UMAP's fault that reducing data dimensions sometimes yields interpretable clusters, and other times not so much. That is likely due to the stochastic nature of biological systems, or errors in data collection. Also, let's not forget that not all datasets are readily interpretable when reduced to two dimensions.