How to identify cluster membership from the tSNE coordinates?
1
0
Entering edit mode
4.3 years ago
Researcher ▴ 130

How to extract or identify cluster membership for each sample using the tSNE coordinates generated from rtsne? Its really urgent, kindly help.

Thanks

tSNE dimention reduction dimensionality reduction • 3.0k views
ADD COMMENT
0
Entering edit mode

Please consider investing some more effort into your post, add some details, show what you've tried.

ADD REPLY
0
Entering edit mode

Hi, Sorry for the incomplete question I asked. Actually, its a two-dimensional reduction of TCGA methylation data. I have used Rtsne() as given below and identified three different clusters of three main groups in the data.

> tsne_realData <- Rtsne(meth, perplexity=30, max_iter=500,learning=200) 
> plot(tsne_realData$Y, pch = 21, cex = 1)

But I also need sample-wise cluster membership in order to identify exact samples from mixed regions. Can you please help or point me any example workflow.

Thanks

ADD REPLY
1
Entering edit mode

I assume you're aware of the pitfalls of clustering in t-SNE space. This being said, it can be legitimate to want to identify which data point falls into which group, especially as part of exploratory data analysis. Since distances are not meaningful in t-SNE space, I would suggest using a density-based clustering algorithm such as dbscan. I would also suggest trying UMAP (e.g. from the R package uwot) instead of t-SNE. In my experience (although on non-bioinformatics data), UMAP tends to be less sensitive to parameter choice and I can more robustly extract clusters with dbscan but note that UMAP is subject to the same issues as t-SNE as far as clustering is concerned.

ADD REPLY
1
Entering edit mode

I may also add that beyond the very good points of Jean-Karim Heriche you should always set a constant seed before running tSNE/UMAP and company like below, since these methods have random elements (are non-deterministic) and therefore will change everytime you run the code, unless you set the seed.

set.seed(123)
your.code(...)
ADD REPLY
2
Entering edit mode
4.3 years ago
Mensur Dlakic ★ 28k

tSNE reflects mostly local relationships but doesn't preserve distance between data points, so we can't use distance-based clustering techniques. Instead, density-based clustering techniques are recommended.

There are many software solutions that cluster based on tSNE embedding of data points, or indeed from other embedding techniques. For example, pick any of the single-cell tools such as Seurat. There should be many examples on Biostars if you spend some time searching.

I will give you one that is meant for separating a mixture (meta)genomic DNA sequences into bins. That is different from what you are doing, but the underlying methodology is the some once the embedding is calculated.

https://github.com/BinPro/CONCOCT

ADD COMMENT

Login before adding your answer.

Traffic: 1409 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6