Question

Database Of Host And Pathogen Pairs

6

Entering edit mode

12.8 years ago

Will 4.6k

I'm looking to do a project on predicting which bacteria colonize a particular host and determine the genomic features which determine these interactions.

Does anyone know of a good database which annotates any known interactions? I know I could pull the cross-organism interactions from a PPI database like BIND but that only gives a handful of examples and seems to be over restrictive.

interaction • 5.8k views

ADD COMMENT • link updated 12.8 years ago by Hamish ★ 3.3k • written 12.8 years ago by Will 4.6k

Michael · Answer 1 · 2012-02-27

5

Entering edit mode

12.8 years ago

Pierre Lindenbaum 164k

See how @rdmpage built a database of host-pathogens using genbank: http://iphylo.blogspot.com/2011/03/visualising-symbiome-hosts-parasites.html

Back in 2006 in a short post entitled "Building the encyclopedia of life" I wrote that GenBank is a potentially rich source of information on host-parasite relationships. Often sequences of parasites will include information on the name of the host (the example I used was sequence AF131710 from the platyhelminth Ligophorus mugilinus, which records the host as the Flathead mullet Mugil cephalus).

Edit: another one: http://www.phi-base.org/ pathogen-host interactions database.

This database contains expertly curated molecular and biological information on genes proven to affect the outcome of pathogen-host interactions. Information is also given on the target sites of some anti-infective chemistries.

ADD COMMENT • link updated 9.7 years ago by Michael 55k • written 12.8 years ago by Pierre Lindenbaum 164k

4

Entering edit mode

I've been meaning to take this project beyond the blog post stage. If there's enough interest I could look at creating a web site and services around the host-parasite data in GenBank.

ADD REPLY • link 12.8 years ago by Roderic Page ▴ 390

1

Entering edit mode

Big (depending on taxonomic scope). I built the visualisations from a subset of GenBank (mainly eukaryote non-EST sequences). If you ask GenBank how many sequences have the "host" field http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=nucleotide&term=host&retmax=10 today it has 3,466,914

ADD REPLY • link 12.8 years ago by Roderic Page ▴ 390

0

Entering edit mode

excellent ... wouldn't have thought to look in Genbank!

ADD REPLY • link 12.8 years ago by Will 4.6k

0

Entering edit mode

@roderic: I've already got a dirty-python script to extract the data. Do you remember roughly how many associations you found? I'm just trying to get an idea for how large this symbiome is going to be.

ADD REPLY • link 12.8 years ago by Will 4.6k

0

Entering edit mode

Just discovered that the search in my previous comment will search for "host" anywhere in the sequence record, so it will return sequences without a "host" field but with host in, say, the title of the article that published the sequence. So the figure of number of hosts will be an overestimate.

ADD REPLY • link 12.8 years ago by Roderic Page ▴ 390

0

Entering edit mode

yeah, parsing through now ... looks to be ~1.7 million triples (genbank-record, host, symbiote)

ADD REPLY • link 12.8 years ago by Will 4.6k

score 3 · Answer 2 · 2012-02-27

The pathogen specific resources might be a useful starting point. For example:

PhytoPath
VectorBase
EuPathDB
WTSI Pathogen genetics
PHI-base "The Pathogen - Host Interaction Database"

If you are including viruses then UniProtKB may be a useful source, since it details organism/host for viruses. Sadly they don't seem to have included other organism/host relationships.

Other possible sources that come to mind are:

metabolic pathway databases. For example KEGG PATHWAY has a set of pathways related to infectious disease that could be useful.
microarray expression databases. For example ArrayExpress contains details of experiments looking for changes in gene expression related to various disease states (try a search for terms like 'infected').

While I suspect that these will suffer from similar limitations to the PPI data they are worth looking at. Additional pairings from the analysis of the INSDC databases, suggested Roderic, will extend coverage. As will text-mining of the literature.

score 2 · Answer 3 · 2012-02-27

2

Entering edit mode

12.8 years ago

Behindtherabbit ▴ 60

there are over 9000 experimentally confirmed host-pathogen PPIs from bacteria available from public PPI databases. if you include viruses as well, that number is much higher. several public (and published) resources cull unique HP-PPI pairs from the broader databases, including PATRIC (our group), HPIdb, PHI-Base, and more. also check out the PSICQUIC Web interface at EBI which lets you query many public dbs yourself. hope that helps!

ADD COMMENT • link 12.8 years ago by Behindtherabbit ▴ 60

0

Entering edit mode

I don't really need the PPI level info, I just need the organism interaction level. When I looked through BIND and NCBI's repository I only found ~500 unique host-pathogen associations.

ADD REPLY • link 12.8 years ago by Will 4.6k

score 2 · Answer 4 · 2012-02-27

2

Entering edit mode

12.8 years ago

Casey Bergman 18k

You could also try EnvDB, a "database that aims to provide the most complete census to-date of the environmental distribution of prokaryotes". Under "environments" select "host associated"

ADD COMMENT • link 12.8 years ago by Casey Bergman 18k