Hello all
I have some gene IDs from Ensembl and I want to get their transcripts' exons and introns (sequences) so that later I can determine exon/intron boundaries and do some analyses.
I've discovered Ensembl Rest API, which is a really easy and clean way of getting data and played around with it a bit. Using this API, I could get coding transcripts of the genes and then sequences of these transcripts. However, I couldn't find any way to distinguish exonic and intronic regions in these sequences.
Here is my script that gets sequences of transcripts of "ENSG00000197568" gene in FASTA format. And I want to get exons and introns like Ensembl gives us in here.
#!/usr/local/bin/python
import httplib2, sys, re, json
def check_response(response):
if not response.status == 200:
print "Invalid response: ", response.status
sys.exit()
http = httplib2.Http(".cache")
server = "http://beta.rest.ensembl.org/"
gene_id = "ENSG00000197568"
query = "sequence/id/" + gene_id + "?type=cds;multiple_sequences=1"
content_type = "text/x-fasta"
response, content = http.request(server + query, method="GET", headers={"Content-Type":content_type})
check_response(response)
transcripts = re.findall(">(.*)", content)
f = open("output.fasta", "wa")
for transcript in transcripts:
query = "sequence/id/" + transcript
response, content = http.request(server + query, method="GET", headers={"Content-Type":content_type})
check_response(response)
f.write(content)
f.close()
Thanks in advance
REST API is really cool, I can't wait to see it fully functional. I'll try Perl API. Actually, I installed it but I got lost in Perl classes and data types. And it seemed a bit slow. But if it's the only option, I will look at it again and tell you if I have questions. Thanks Emily.
Have you seen our new online course? There are various scripts in there that you can cannibalise to make life easier.
Yes, I have. And started watching tuts and doing exercises. It'll definitely help. Thanks