Question

ENSEMBL REST API: get homology ID

0

Entering edit mode

8.4 years ago

ana16 ▴ 10

Hi everyone, I am looking to get all the ortholog sequences from ENSEMBL in FASTA format, given a human gene ID. I found ENSEMBL REST API and think this can perform what I am looking. I was able to get some output using:

wget -q --header='Content-type:text/xml' 'https://rest.ensembl.org/homology/id/ENSG00000157764?sequence=cdna;type=orthologues'  -O -

However, the output contains a lot of additional information (such as headers, descriptions) and also several times the human gene alignment. A small example is shown below:

d":"ENSG00000157764"},"dn_ds":null,"target":{"perc_pos":22,"protein_id":"ENSP00000309597","taxon_id":9606,"cigar_line":"

I would like to simply get each ortholog in a nice FASTA file, starting with the >Homo_sapiens query.

>Homo_sapiens_geneID
ATGTTATATG
>mus_musculus _OrthologID
ATGTTAAATG

Is there any post-processing that I could apply to this file in order to get what I am looking for? Or is there any other program that could do a similar approach (input: could be several ENSEMBL ortholog IDs and retrieve their cDNA in FASTA format)?

Thank you very much for your help, I appreciate your feedback.

Ana

ENSEMBL API alignment homology • 1.8k views

ADD COMMENT • link updated 8.4 years ago by Jean-Karim Heriche 27k • written 8.4 years ago by ana16 ▴ 10

score 3 · Accepted Answer · 2016-06-23

3

Entering edit mode

8.4 years ago

Jean-Karim Heriche 27k

The Ensembl perl API allows you to get just the data you need.

ADD COMMENT • link 8.4 years ago by Jean-Karim Heriche 27k