The goal is simple, but I cannot figure out a way to make it work in any of the REST endpoints. How can I retrieve the CDS (DNA) corresponding to a Protein ID? Not relevant, but I am working in Python.
There are two approaches I thought would be possible:
- A direct retrieval using the "/sequence/id/" endpoint, with 'cds' as the 'type'.
- Identifying the corresponding transcript id from the protein id, and then using that transcript id in the "/sequence/id/" endpoint. I cannot find a way to identify the corresponding transcript id.
Neither of these approaches have been fruitful. This seems like such a simple/common need that surely I am just unaware of the proper way to get it done.
Any help would be much appreciated!
What protein ID do you have? Can you please give an example?
[edited]
Hi Bert!
Taking as input an Ensembl protein id 'ENSP00000430656' I would like to retrieve the corresponding transcript id, which is 'ENST00000523953'.
From there the goal is to retrieve the CDS which can be achieved by the '/Sequence/ID/' endpoint in the following way: 'http://rest.ensembl.org/sequence/id/ENST00000523953?content-type=application/json;type=cds'.
I just have not been able to find any way to retrieve a corresponding Transcript id using just the Protein id as input.
The best solution so far involves using the Gene id in an 'Overlap' query as such:
'http://rest.ensembl.org/overlap/id/ENSG00000133742?feature=gene;content-type=application/json;feature=cds'
This is the only location I have identified where both Protein id and corresponding Transcript id exist in the same entry.
-Harlan
I forwarded your question to one of the experts, i.e. Magali from the Ensembl team. Please see her answer below.