Question

program/ software to generate large synthetic benchmark gene/ protein interaction network

0

Entering edit mode

3.4 years ago

cwwong13 ▴ 40

I recognize that there are some available software/ programs that can generate synthetic gene/ protein regulatory networks, such as GeneNetWeaver. However, I found that these programs usually can only generate a network size of up to a couple of thousand genes.

My goal is to generate a network that has around 15,000 genes (or protein) in total, with output as a gene/ protein expression table of ~ 100 samples. Thus, I can use this ground truth network to evaluate the network reconstruction algorithm.

I appreciate it if you know of any program that can do such a job. If there is none, it would still be great if you can suggest to me where to start if I would like to write my own (if that is possible to modify the current programs/ packages?).

Thanks!

PPI interaction benchmark network R • 826 views

ADD COMMENT • link 3.4 years ago by cwwong13 ▴ 40

score 0 · Answer 1 · 2021-07-05

0

Entering edit mode

3.4 years ago

Mensur Dlakic ★ 28k

I suggest you look into papers that are similar to GeneNetWeaver publication, or those that cite it.

I am by no means an expert on this topic, but it seems like an overkill to work with datasets that have 15K nodes. I mean, how many cell types in any organism will be expressing 15K genes at the same time? Even if they do, what method can measure that many genes reliably? Even if you have such capability, differential expression methods will rarely give networks that have more than couple of thousand nodes.

ADD COMMENT • link 3.4 years ago by Mensur Dlakic ★ 28k

0

Entering edit mode

I am very new to the field who do not even know what is an ordinary biological network. I have some follow-up questions and would like to know if I can scale down:

I agree that no cell will express 15K genes at the same time. However, if I am going to measure the mRNA expression by RNA-seq/ microarray, I can easily get gene expression data for ~11000 - 15000 genes (after gentle filtering of genes that do not express in any of the samples). However, I wonder whether all of these genes will form a big connected network? In other words, whether genes/ proteins will tend to form many disconnected subnetworks, and these subnetworks are independent of each other (not even have any links/ edges)? I am talking about in reality (the ground truth biological system in mammalian based on our current best knowledge) but not based on the current reconstructed networks by various algorithms.

In either case, I wonder whether it is reasonable to just merge multiple networks generated by GeneNetWeaver? In the case of real biology is composed of multiple discrete subnetworks, I am thinking to simply concatenate, for instance, 10 networks with ~1500 genes together. On the other hand, if these subnetworks still have some dependency on each other, how are they usually connected? If they are connected through 1 or 2 key regulators, may I just artificially add some correlation between two subnetworks during the concatenation?

Indeed, the synthetic network does not need to perfectly fit the reality (e.g. not necessary 10 networks of ~1500 genes, but maybe 20 subnetworks with 200-2000 nodes), but I still want it to resemble those critical structures/organizations found in real mammalian systems.

Thanks!

ADD REPLY • link 3.4 years ago by cwwong13 ▴ 40