Silhouette plots in R
1
0
Entering edit mode
7.7 years ago
Ron ★ 1.2k

Hi all,

I want to create Silhouette plots in R. I have the code for that:

library (cluster)
library (vegan)

dis = vegdist(exp_matrix)
res = pam(dis,3) #choice of clustering algorithm 
sil = silhouette (res$clustering,dis) #  cluster vector
pdf('my_nice_plot.pdf')
plot(sil,col=meta$Colors)
dev.off()

My "meta" table consists of Sample names, the grouping category and their colors.So basically I want to color them by their known grouping category. I am specifying 3 clusters,because of 3 grouping categories.But some samples fall into another cluster although they belong to the other grouping category,which is the issue.

I want 3 clusters with 3 colors and if the sample does not fall in the cluster,it should be anti-correlated(represented as bar in opposite direction),rather than falling in another cluster..And hence each 3 clusters will have same color.

Any suggestions ?

Below is my exp_matrix


Sample1 Sample2 Sample3 Sample4 Sample5 Sample6 Sample7 Sample8 Sample9 Sample10
1.335102054 3.077470899 3.138065706 2.554060461 2.533176175 3.787130648 2.017600408 2.406238299 2.571645353 2.786922944
1.856828447 2.903459704 2.032062343 2.089422039 2.181253692 2.168857947 1.340714464 1.334107714 1.252602475 1.599683962
0.997564505 1.003324612 0.937807943 1.033256787 2.01130398  0.997948553 1.459188016 0.948419986 0.933616747 0.938739858
19.91490203 161.3801497 2.974933925 1.15985526  95.63030172 6.869383772 6.224809354 22.43439844 21.77444457 26.02266932
1.235250155 2.752533398 1.852711702 1.294324019 2.202763936 1.221033762 1.094792065 1.070960481 1.1242694   1.089553158
1.01820685  1.177473999 1.000458518 1.028822213 1.321418848 1.089197025 1.079738645 0.997618434 0.997463311 0.993578144
1.107500627 1.074571335 1.110524814 1.192469082 1.047676121 1.127642088 1.01099336  0.993979302 1.092574354 0.987137348
1.036566994 1.008010924 1.04684827  1.055822448 1.087434494 1.01645263  1.054939718 1.059915024 1.04011888  1.043169394
1.771508775 4.541376768 6.247271229 1.698569724 4.300979691 1.922687958 3.996811113 5.225271269 3.427816413 3.586772962
0.997064599 1.054684913 1.02028153  1.01396626  1.031323032 1.020142345 1.050812112 1.021389599 1.04308284  1.048526295
11.97373936 11.09493899 12.56579193 16.26164455 16.95326009 11.59602467 7.035111423 8.062786948 11.9170942  10.97490212
4.563845863 4.417316767 2.868204674 9.889057888 7.385072105 2.580583255 7.461528487 11.89242726 18.39414879 18.86188122
1.000273446 1.000761958 0.998605053 0.995449223 1.040481771 0.999784929 1.033818514 1.026504592 0.984465721 0.98577008
0.981661587 2.521671659 1.006165314 0.97749202  0.999569316 0.979894883 1.304261432 0.983144406 0.980168076 0.97749202
1.004147694 1.013271707 1.006286227 0.997838269 1.01209634  1.004147694 1.036567987 1.001845662 1.001209392 0.997336005
1.005404334 1.005404334 1.005894537 0.998665037 1.015797531 1.009839843 1.023461132 1.019771494 0.998482364 0.998938971
0.998573504 0.998573504 1.019684068 1.014517921 1.018093706 0.998573504 1.033454584 1.008350063 1.004199745 1.007911467
4.435206425 2.571609049 2.202237633 18.22954904 10.39052668 1.281203044 2.51255292  2.786338681 2.947128775 3.972638039
1.25034501  3.454024869 2.532858896 3.067917607 1.858659586 1.57838548  1.959222293 3.429776931 2.838722643 3.075910635
3.684780859 6.868469943 6.94562784  8.108387027 8.395853627 6.062065966 3.533193809 5.382000926 9.113293535 8.081187443

Here is the metadata:

Analysis_id Classification  Colors
Sample1 Clus1   red
Sample2 Clus1   red
Sample3 Clus1   red
Sample4 Clus1   red
Sample5 Clus2   green
Sample6 Clus2   green
Sample7 Clus2   green
Sample8 Clus2   green
Sample9 Clus3   orange
Sample10    Clus3   orange

Thanks, Ron

RNA-Seq silhouette clustering R • 4.8k views
ADD COMMENT
0
Entering edit mode

We can't reproduce your code without data (exp_matrix)

ADD REPLY
0
Entering edit mode
7.7 years ago
Mattias Aine ▴ 640

So you want to cluster the columns or the rows, I interpret your question as pertaining to columns!?

Your code currently calculates the distance for rows.

dis <- vegdist( t(exp_mat) ) #distance for samples/columns

Also you mention correlation but vegdist by default calculates Bray-distance.

The rest is a bit more difficult to follow but from what I gather you actually want to group and color the samples by meta$Classification and then plot a negative bar if pam doesn't put it in the same cluster? This doesn't really sound like what a silhouette plot is for or then I have misunderstood..

ADD COMMENT

Login before adding your answer.

Traffic: 2475 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6