Hello,
I want to cluster a big dataset of almost 5000 proteins. I want to have about 200 clusters in the end. To set the clusters to a specific amount, I wanted to use k-means clustering, since this works well in the String web tool with smaller datasets. Since there is no k-means option in the cluster network algorithms, I tried the cluster attributes k-means.
But the only attributes shown are compartments and tissues. I tried setting the edge columns to stringdb::score but this led to all proteins being in the cluster zero. How can I use the k-means clustering on the distance matrix from String global scores (like in the web tool)? Or is there another clustering option for predefined cluster sizes/amount?
Many thanks in advance!
Thanks for the answer. I just looked it up and I don't have the stringdb::interactor score in the attributes. In my network tables I only have stringdb::score in the edge table, however the stringdb::interactor is also missing in any of my tables. Perhaps this is my problem. MCL gives me several small clusters and one huge cluster, if I play around with the settings I still end up with a highly unbalanced number of proteins in the clusters.