Question

Get Exons & Introns Using Ensembl Rest Api

4

Entering edit mode

11.0 years ago

Gungor Budak ▴ 270

Hello all

I have some gene IDs from Ensembl and I want to get their transcripts' exons and introns (sequences) so that later I can determine exon/intron boundaries and do some analyses.

I've discovered Ensembl Rest API, which is a really easy and clean way of getting data and played around with it a bit. Using this API, I could get coding transcripts of the genes and then sequences of these transcripts. However, I couldn't find any way to distinguish exonic and intronic regions in these sequences.

Here is my script that gets sequences of transcripts of "ENSG00000197568" gene in FASTA format. And I want to get exons and introns like Ensembl gives us in here.

#!/usr/local/bin/python

import httplib2, sys, re, json

def check_response(response):
    if not response.status == 200:
        print "Invalid response: ", response.status
        sys.exit()

http = httplib2.Http(".cache")
server = "http://beta.rest.ensembl.org/"
gene_id = "ENSG00000197568"
query = "sequence/id/" + gene_id + "?type=cds;multiple_sequences=1"
content_type = "text/x-fasta"

response, content = http.request(server + query, method="GET", headers={"Content-Type":content_type})
check_response(response)

transcripts = re.findall(">(.*)", content)
f = open("output.fasta", "wa")

for transcript in transcripts:
    query = "sequence/id/" + transcript
    response, content = http.request(server + query, method="GET", headers={"Content-Type":content_type})
    check_response(response)
    f.write(content)

f.close()

Thanks in advance

exon ensembl intron • 6.6k views

ADD COMMENT • link updated 2.9 years ago by Ram 44k • written 11.0 years ago by Gungor Budak ▴ 270

1

Entering edit mode

11.0 years ago

Emily 24k

Hi Gungor

I'm afraid we don't have a straightforward option of downloading a gene sequence in that format using the REST API at present. The service is still in its beta phase, so is not yet at its full capability. We're trying to prioritise functionality that we know users are interested in, so we will take your feedback into account when deciding which endpoints we want to add next.

You can get the exons using the sequence/id method.

If you're a perl programmer, this data is very easy to get via the Perl API, which I can help you with if needed.

Emily

ADD COMMENT • link updated 2.9 years ago by Ram 44k • written 11.0 years ago by Emily 24k

0

Entering edit mode

REST API is really cool, I can't wait to see it fully functional. I'll try Perl API. Actually, I installed it but I got lost in Perl classes and data types. And it seemed a bit slow. But if it's the only option, I will look at it again and tell you if I have questions. Thanks Emily.

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 11.0 years ago by Gungor Budak ▴ 270

1

Entering edit mode

Have you seen our new online course? There are various scripts in there that you can cannibalise to make life easier.

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 11.0 years ago by Emily 24k

0

Entering edit mode

Yes, I have. And started watching tuts and doing exercises. It'll definitely help. Thanks

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 11.0 years ago by Gungor Budak ▴ 270

Ram · Accepted Answer · 2014-09-19

Hi Gungor,

We have taken your suggestion into account and added an option to softmask intronic regions

This option is available on our new rest server, http://rest.ensembl.org, along with improved performance.

The following: http://rest.ensembl.org/sequence/id/ENSG00000157764?content-type=text/plain;mask_feature=1

will return the whole gene sequence, with intron sequences in lower case.

In your example, http://rest.ensembl.org/sequence/id/ENSG00000197568?content-type=text/plain;type=cds;multiple_sequences=1

you are already retrieving only the coding sequence for each transcript in the gene. Hence, there are no intronic regions.

I hope this helps and please do not hesitate to contact us if you have any further enquiries.

Regards,
Magali