Question

Spectral Clustering for TCGA/Gene Expression Data

0

Entering edit mode

5.1 years ago

aaragak1 ▴ 40

Hello all,

I've been using M3C for consensus clustering on TCGA data to get an estimate on how many 'real' clusters of tumors are in these data. I'm wondering if and when spectral clustering should be applied vs others (hc, pam, km).

Thanks for your time!

R RNA-Seq • 986 views

ADD COMMENT • link updated 5.1 years ago by Jean-Karim Heriche 27k • written 5.1 years ago by aaragak1 ▴ 40

score 2 · Accepted Answer · 2020-04-15

As usual when it comes to clustering, the answer is 'it depends'. PAM and K-means assume that clusters are more or less spherical, hierarchical clustering makes different assumptions depending on the linkage method used but for Ward's criterion the clusters are also assumed to be spherical, average and complete linkage also tend to work best on spherical clusters. Hierarchical clustering doesn't actually directly produce clusters but a tree that needs to be cut to produce clusters. How you cut the tree also affects the number of clusters. Spectral clustering can be useful for finding clusters in some more complicated situations by essentially trying to find a space in which the clusters are well separated and spherical (if one uses k-means at the clustering step). Then it also depends on how you measure distance/similarity. So unless there is a clear cluster structure, different algorithms will produce different clusters. Also keep in mind that what an algorithm calls a cluster may or may not make sense to you. I would suggest to explore the data first by plotting various representations then trying hierarchical clustering to get a feeling for whether there are distinguishable clusters.