CytoScape parameters for k-means clustering
1
1
Entering edit mode
9 months ago
Phenylananin ▴ 20

Hello,

I want to cluster a big dataset of almost 5000 proteins. I want to have about 200 clusters in the end. To set the clusters to a specific amount, I wanted to use k-means clustering, since this works well in the String web tool with smaller datasets. Since there is no k-means option in the cluster network algorithms, I tried the cluster attributes k-means.

But the only attributes shown are compartments and tissues. I tried setting the edge columns to stringdb::score but this led to all proteins being in the cluster zero. How can I use the k-means clustering on the distance matrix from String global scores (like in the web tool)? Or is there another clustering option for predefined cluster sizes/amount?

Many thanks in advance!

cytoscape clustermaker clustering k-means • 669 views
ADD COMMENT
1
Entering edit mode
9 months ago
xanderpico ▴ 580

Thanks for the question. A couple suggestions and observations...

  1. In the k-means cluster settings, I see "stringdb::interactor score" (or stringdb::disease score for disease queries) in the list of Node Attributes, right in the middle of the list between compartments and tisues.
  2. Clustering on edge score also resulted in zero clusters for me in two test cases. Not sure why...
  3. I typically recommend MCL clustering for STRING networks. The developers of the stringApp for Cytoscape even added it as a button in the right-side panel. You can increase the granularity to effectively increase the number of clusters returned. You can also choose to focus on the top-n clusters regardless of granularity (e.g., if you want to keep the same value across multiple networks).
ADD COMMENT
0
Entering edit mode

Thanks for the answer. I just looked it up and I don't have the stringdb::interactor score in the attributes. In my network tables I only have stringdb::score in the edge table, however the stringdb::interactor is also missing in any of my tables. Perhaps this is my problem. MCL gives me several small clusters and one huge cluster, if I play around with the settings I still end up with a highly unbalanced number of proteins in the clusters.

ADD REPLY

Login before adding your answer.

Traffic: 2614 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6