select minimum gene subnetwork size
0
0
Entering edit mode
7.5 years ago
arnetmit • 0

Hi ,

I have lots of extensive gene networks and I want to divide into subnetworks to examine their relations to biological processes. I want to select a threshold value as minimum number of genes for each subnetwork to eliminate redundant small sub-networks. How can I determine this value. At least 10 genes, 30 genes or 50 genes ? Must I try all these values for the comparison?

gene • 1.8k views
ADD COMMENT
0
Entering edit mode

On the face of it, splitting gene networks into sub-graphs with a fixed number of genes doesn't make much sense unless you know that all sub-graphs should have the same size and even if you know the size of each sub-graph, what criteria are you using to split the networks, i.e. how do you decide which gene should go in which sub-graph ? It seems to me that a graph node clustering approach would be more suitable. This can be followed by a standard biological process GO terms enrichment analysis.

ADD REPLY
0
Entering edit mode

Thank you for reply.

I use MCL for obtaining subnetworks but size of subnetworks naturally varies so I want to determine a threshold value that indicates the minimum number of nodes in subnetwork. That's why I want to determine it. To sum up, I determine three different threshold values for this purpose. I hope it'll be adequate.

ADD REPLY
0
Entering edit mode

That makes more sense but is there a reason you want to exclude clusters below a certain size, 10 genes could be a whole pathway ? If you want to remove redundancy between clusters coming from different graphs, you should look for overlap and maybe consider that clusters that share more than e.g. 70% of their genes are the same.

ADD REPLY
0
Entering edit mode

I don't want to determine sub-networks with equal size , I just plan to eliminate some sub-networks whose size is lower than a threshold cos I have plenty of sub-networks. I try to infer sub-networks that has more than 10, 50, 100 genes ( means size of some of them is 15,25,64, etc.)

ADD REPLY
1
Entering edit mode

Yes this is what I understood but to me it doesn't make sense to a priori eliminate clusters based on size if you're interested in the biological processes they can represent. As I wrote, a cluster of 5-10 genes could comprise all the known genes for a given process or pathway. I guess that based on the limited information you gave, I don't see why having many clusters is a problem.

ADD REPLY
0
Entering edit mode

for instance , wgcna recommend some threshold as module size ( in its package tutorial, it is given as 30 genes for modules). If we I have a extensive network than it naturally have lots of small sub-networks so more of them are redundant. Is it weird?

ADD REPLY
0
Entering edit mode

How are WGCNA recommendations relevant to your data ? Also, what do you mean by redundant ? Unless you tell us more about what you're doing and the question you're trying to address, all I see is that you've got clusters of genes for which you want to know if they are related to some biological processes. The standard approach for this is to do some GO terms enrichment analysis on each cluster.

ADD REPLY
0
Entering edit mode

GO terms enrichment analysis on each cluster is the main purpose but the problem is to derive robust clusters that have more than a pre-determined number of genes.

ADD REPLY
1
Entering edit mode

Why do you want/need a predetermined number of genes in each cluster ? As I wrote above, without more information, this doesn't make sense.

ADD REPLY
0
Entering edit mode

you claim that there is no need for threshold.

I tested that if my cluster's size is less, I get less insignificant go-terms but thank you.

I solved my problem.

ADD REPLY

Login before adding your answer.

Traffic: 2611 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6