As far as I can see, there is no web service around that can do exactly what you want. If you know the species of interest, you could quite easily retrieve it from BioMart using a simple XML formatted query:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Query>
<Query virtualSchemaName = "default" formatter = "TSV" header = "0" uniqueRows = "0" count = "" datasetConfigVersion = "0.6" >
<Dataset name = "olatipes_gene_ensembl" interface = "default" >
<Filter name = "ensembl_peptide_id" value = "ENSORLP00000023599"/>
<Attribute name = "external_gene_id" />
<Attribute name = "ensembl_peptide_id" />
</Dataset>
</Query>
You can send this query to their web service using a simple GET request if you like:
http://www.biomart.org/biomart/martservice?query=<Query virtualSchemaName="default" formatter="TSV" header="0" uniqueRows="0" count="" datasetConfigVersion="0.6"><Dataset name="olatipes_gene_ensembl" interface="default"><Filter name="ensembl_peptide_id" value="ENSORLP00000023599"/><Attribute name="external_gene_id"/><Attribute name="ensembl_peptide_id"/></Dataset></Query>
One solution is thus that you make a simple lookup table where based on the letters before the numbers (in this case ENSORLP) you find out which data set in BioMart to query. This is obviously a pain since you would have to keep it up-to-date with new versions of Ensembl.
The other way to go about it is to grudgingly accept that the web services cannot do what the web interface can, and thus to make your script access the web interface. Yes it is ugly, but it gets the job done.
In your example, you can perform the web interface query by requesting this URL.
In the resulting HTML, you should identify a section like this:
<table class="search_results">
<tr><th colspan="2">By Feature type</th></tr>
<tr><td><a href="/Multi/Search/Details?species=all;idx=;q=ENSORLP00000023599">Total</a></td><td><a href="/Multi/Search/Details?species=all;idx=;q=ENSORLP00000023599">1</a></td></tr>
<tr>
<td><a href="/Multi/Search/Details?_C=eJyLz2FIzWOIL8tjSElNSyzNKWGIL2Rw9Qv2D*IJMAADI2NTS0uF5PyigvyixJJU*ZKi1FQrpZD8Av3g*NKi5FT9VDMDJYb4jMwSt9KcHAZDAwYARHIZlw__&_c=%2b15927579347680844937" class="collapsible"><img src="/i/list_shut.gif" alt=">" style="padding-right:4px" />Gene</a>
<ul class="shut">
<li><a href="/Multi/Search/Details?_C=eJyLz2FIzWOIL8tjSElNSyzNKWGIL2Rw9Qv2D*IJMAADI2NTS0uF5PyigvyixJJU*ZKi1FQrpZD8Av3g*NKi5FT9VDMDJYb4jMwSt9KcHAZDAwYARHIZlw__&_c=%2b15927579347680844937&_c=%2b10649513383310789308">Oryzias latipes (1)</a></li></ul>
</td>
<td style="width:5em"><a href="/Multi/Search/Details?_C=eJyLz2FIzWOIL8tjSElNSyzNKWGIL2Rw9Qv2D*IJMAADI2NTS0uF5PyigvyixJJU*ZKi1FQrpZD8Av3g*NKi5FT9VDMDJYb4jMwSt9KcHAZDAwYARHIZlw__&_c=%2b15927579347680844937">1</a>
</tr>
</table>
From this, you would extract the last URL mentioned:
http://ensembl.org/Multi/Search/Details?_C=eJyLz2FIzWOIL8tjSElNSyzNKWGIL2Rw9Qv2D*IJMAADI2NTS0uF5PyigvyixJJU*ZKi1FQrpZD8Av3g*NKi5FT9VDMDJYb4jMwSt9KcHAZDAwYARHIZlw__&_c=%2b15927579347680844937
Retrieving that page, will yield you some HTML inside which you look for <p>Your query matched 1 entries in the search database</p>
, after which you'll find the link to the next page to retrieve:
http://www.ensembl.org/Oryzias_latipes/Gene/Summary?g=ENSORLG00000018912;r=scaffold676:104884-110194;t=ENSORLT00000023600
Inside this HTML file you pull out the part that looks like <h2 class="caption">Gene: HRAS (ENSORLG00000018912)</h2>
, which will give you the gene name for your identifier.
This solution is obviously a pain to implement, likely to break if Ensembl makes changes to their web interface, and ugly as sin. In contrast to the first solution, however, it should be able to automatically deal with Ensembl updating their database with more genomes.
The ensembl web-site can do it, as you say. So there must be a way to do it programmatically. Maybe one could try to find out how the web-site does it. The result URL looks like a DAS query, but I would bet that the perl-API has a method to run such a query too. I'm not sure how this works though.
Michael, I thought this as well and tried to dig into it. The closest was the
gene_autocomplete
table in theensembl_website_60
database, but this only contains mappings from gene names to organisms and doesn't support proteins. My guess would be that they have some type of full text search index supporting those queries.