Question

Building Gene Regulatory Networks From Literature

6

Entering edit mode

12.7 years ago

Diana ▴ 930

Hi everyone!!

Does anyone know of any available software that would build gene regulatory networks from literature alone?

Thanks!

gene network software text • 5.4k views

ADD COMMENT • link updated 12.7 years ago by Alex Paciorkowski 3.5k • written 12.7 years ago by Diana ▴ 930

1

Entering edit mode

I have heard of groups recently that manually or semi-manually encode regulatory networks from literature. There are a couple of software packages and format specifications for it. But maybe you are asking about automated regulatory networks from high-throughput gene expression assays or proteomics assays.

ADD REPLY • link 12.7 years ago by 14134125465346445 ★ 3.6k

score 5 · Answer 1 · 2012-04-20

There was some work done on this by Saric et al a few years back that was implemented in STRING:

Large-scale extraction of gene regulation for model organisms in an ontological context. http://www.ncbi.nlm.nih.gov/sites/entrez/15972005

Extraction of regulatory gene/protein networks from Medline. http://www.ncbi.nlm.nih.gov/sites/entrez/16046493

The RegulonDB team also did some work in this area as well:

Automatic reconstruction of a bacterial regulatory network using Natural Language Processing. http://www.ncbi.nlm.nih.gov/sites/entrez/17683642

Unfortunately, as is the case with many text mining papers, no software is made publicly available for either of these systems, but you could use some of the resources to reconstruct a related system for yourself.

score 3 · Answer 2 · 2012-04-21

Diana, the issue with building any gene regulatory network based solely on literature is the amount of hand curation that still must go into your dataset. Otherwise, the resulting network may have little biological meaning. Sometimes what is reported in the literature is true only during a specific developmental context (ie transcription factor 1 turns on transcription factor 2 only during weeks 4-10 of development of the organism in your area of interest... otherwise the 2 don't interact... could be a problem if some of the papers your algorithm mines include data from the adult end of the organism lifespan -- when the genes don't interact...), other times the literature is wrong, or when data on a gene was published no one knew there were actually 3 closely related genes, not just one (I just finished a project where this was the case, so a lot of the old expression data on gene FOO is a mix of what we now know are later-discovered genes FOOA, FOOB, and its cousin FOOC.) And so on. Any network reconstruction project most wisely begins with a phase of expert curation of the dataset to be analyzed, to make sure you have apples with apples, and oranges elsewhere. After that, manually checking your algorithm output ("Is our algorithm finding known interactions, that we know to be true? If not, why not?") is also important. Otherwise you end up with an undigestable hairball of dubious biological relevance, or end up including things like "RNA polymerase" as a critical hub... Of course, having a last phase with wet-lab biological validation of at least key interactions in your network is also important.

I guess my main message is that network reconstruction is more than just a question of software, it's a fairly complex undertaking by a team with various areas of expertise.

Eric Davidson wrote an elegant book (The Regulatory Genome) on gene regulatory networks, and the amount of downstream validation of the predictions in that work is truly impressive.

score 1 · Answer 3 · 2012-04-21

Hi Diana

You could look at the Agilent Literature search plugin (http://www.agilent.co.uk/labs/research/litsearch.html) in Cytoscape (www.cytoscape.org). This takes a list of gene ids and constructs a network using co-citation I believe. If you're using Cytoscape you can carry out all sorts of network comparisons, overlaps etc etc.

Best

duff