I'm looking to do a project on predicting which bacteria colonize a particular host and determine the genomic features which determine these interactions.
Does anyone know of a good database which annotates any known interactions? I know I could pull the cross-organism interactions from a PPI database like BIND but that only gives a handful of examples and seems to be over restrictive.
I've been meaning to take this project beyond the blog post stage. If there's enough interest I could look at creating a web site and services around the host-parasite data in GenBank.
Big (depending on taxonomic scope). I built the visualisations from a subset of GenBank (mainly eukaryote non-EST sequences). If you ask GenBank how many sequences have the "host" field http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=nucleotide&term=host&retmax=10 today it has 3,466,914
excellent ... wouldn't have thought to look in Genbank!
@roderic: I've already got a dirty-python script to extract the data. Do you remember roughly how many associations you found? I'm just trying to get an idea for how large this symbiome is going to be.
Just discovered that the search in my previous comment will search for "host" anywhere in the sequence record, so it will return sequences without a "host" field but with host in, say, the title of the article that published the sequence. So the figure of number of hosts will be an overestimate.
yeah, parsing through now ... looks to be ~1.7 million triples (genbank-record, host, symbiote)