There are multiple posts 1 2 3 4 on this website which tangentially touch on this quesion, but I haven't found any that ask directly this: Would you flirt with the idea of using UMAP, t-SNE, Diffusion Map, Force Atlas2, ICA or any other low-dimensional embedding as the basis for the cell clustering (whatever the clustering method downstream k-means, sNN, Louvain, etc.)?
It does happen more often than not in my analyses that with the standard Seurat pipeline of cluster definition via PCA -> kNN -> Louvain, the downstream UMAP cell embedding sometime puts cells from the same PCA cluster into two opposing extremities of the UMAP 2d representation. Even though UMAP makes sense when you look at the known marker genes (i.e. there are patterns of gradual decrease of expression or patches of locally highly concentrated expressing cells, for example)
We are always told that we should use PCA for cell clustering because it doesn't distort the euclidian distances between the cells, unlike all the methods mentioned above (except for ICA). but what if I LIKED my UMAP embedding more than the PCA? What if I thought that it had done a better job at emphasizing closer distances between the cells (see the last sentence of the previous paragraph) which probably correspond to cell states/cell types?
thanks, I like the VAE-based approach, I'll try it! Although it's a bit of a blackbox-type of clustering, it seems