Hi everyone, I am looking to get all the ortholog sequences from ENSEMBL in FASTA format, given a human gene ID. I found ENSEMBL REST API and think this can perform what I am looking. I was able to get some output using:
wget -q --header='Content-type:text/xml' 'https://rest.ensembl.org/homology/id/ENSG00000157764?sequence=cdna;type=orthologues' -O -
However, the output contains a lot of additional information (such as headers, descriptions) and also several times the human gene alignment. A small example is shown below:
d":"ENSG00000157764"},"dn_ds":null,"target":{"perc_pos":22,"protein_id":"ENSP00000309597","taxon_id":9606,"cigar_line":"
I would like to simply get each ortholog in a nice FASTA file, starting with the >Homo_sapiens query.
>Homo_sapiens_geneID
ATGTTATATG
>mus_musculus _OrthologID
ATGTTAAATG
Is there any post-processing that I could apply to this file in order to get what I am looking for? Or is there any other program that could do a similar approach (input: could be several ENSEMBL ortholog IDs and retrieve their cDNA in FASTA format)?
Thank you very much for your help, I appreciate your feedback.
Ana