I am working on analysis of weighted co-expression network. My next step is to measure network topology in terms of betweeness , closeness , degree . For this I am using the CytoNCA app within cytoscape. But I am just curious to know if I need to apply threshold value for betweeness , closeness , degree scores in order to call top ranked genes as hub genes of network.
I don't believe there are any pre-defined or standardised cut-offs due to the fact that network metrics can vary tremendously based on numerous other parameters, such as:
distance metric used for network construction (e.g. Euclidean distance, correlation, etc)
total number of nodes / vertices
any pre-filtering on edge values (e.g. removing weak edges)
the nature of the data, e.g., a network constructed from
differentially expressed genes using 1 group of samples in which they were found to be differentially expressed will likely produce a 'stronger' network than one produced from randomly selected genes
whether you're plotting just a graph or a minimum spanning tree of
the graph
et cetera
Also, these scores are presented differently in different studies. For example hub scores, closeness centrality, and betweenness centrality can either be presented as scaled to 0-1 or as 'raw' scores. Degree is obviously just degree..
Thus, your choice of threshold should be based on the rank of the scores. Higher scores obviously indicate a more important vertex. Once you take a look over the results, you'll get a feeling of where you should be setting thresholds.
Thank you so much for detail explanation. I also looked into various articles where they have mentioned "TOP RANKED" . I will also select top ranked genes and perform further analysis.
I read somewhere; selecting hubs or bottleneck genes in range of 10%-40% of genes dose not have a much impact on the results.
Yes, that's a good general figure. I have been using >0.4 (>40%) in a recent experiment.