Hello everyone, I am working on a scRNA-Seq data with multiple clustering algorithms. I want to see the Quality of clustering with Silhouette plot function of "cluster" package. As I know, the function requires a distance matrix. My data is stored in Seurat object and clustering with Seurat method provides an SNN matrix. Which option would be better to use?
1) Using SNN matrix as distance matrix 2) Calculating a new distance matrix with dist() function. My data is really big so I am sure it will take so much time. I can create the distance matrix with principle components (let's say first 50 PCs).
Thank you in advance.
To get the SNN, the distance matrix had to be computed so it should be available somewhere. The SNN matrix is a similarity matrix (it is based on the Jaccard index) so it may need to be converted to a distance.
Thank you for your answer. I tried many different data sets but I have another problem with all of them. All cluster scores in silhouette plot is -1. Is this a common problem? I used my own data and published data sets but nothing changed.
A low value of the silhouette index suggests inadequate clustering but without seeing the data and the processing, it's impossible to say what's going on. One possibility could be that you're not computing it on a distance matrix but on a similarity matrix.
Hello again and thank you for your answer. As you said, I just realized, algorithms I use are creating a similarity matrix I believe. I will use modularity function provided by Scran package.
Thank you for your help.