Question

CytoScape parameters for k-means clustering

1

Entering edit mode

14 months ago

Phenylananin ▴ 20

Hello,

I want to cluster a big dataset of almost 5000 proteins. I want to have about 200 clusters in the end. To set the clusters to a specific amount, I wanted to use k-means clustering, since this works well in the String web tool with smaller datasets. Since there is no k-means option in the cluster network algorithms, I tried the cluster attributes k-means.

But the only attributes shown are compartments and tissues. I tried setting the edge columns to stringdb::score but this led to all proteins being in the cluster zero. How can I use the k-means clustering on the distance matrix from String global scores (like in the web tool)? Or is there another clustering option for predefined cluster sizes/amount?

Many thanks in advance!

cytoscape clustermaker clustering k-means • 876 views

ADD COMMENT • link 14 months ago by Phenylananin ▴ 20

score 1 · Answer 1 · 2024-02-07

Thanks for the question. A couple suggestions and observations...

In the k-means cluster settings, I see "stringdb::interactor score" (or stringdb::disease score for disease queries) in the list of Node Attributes, right in the middle of the list between compartments and tisues.
Clustering on edge score also resulted in zero clusters for me in two test cases. Not sure why...
I typically recommend MCL clustering for STRING networks. The developers of the stringApp for Cytoscape even added it as a button in the right-side panel. You can increase the granularity to effectively increase the number of clusters returned. You can also choose to focus on the top-n clusters regardless of granularity (e.g., if you want to keep the same value across multiple networks).