Ensembl exon-to-dna mapping
2
0
Entering edit mode
10.2 years ago
jgbradley1 ▴ 110

Hopefully this question isn't too specific. I am using the latest release of the human genome in the Ensembl database (homo_sapiens_core_76_38). I would like to map exons to their dna sequence. The database schema seems to indicate that I can take the seq_region_id from the exon table and use that to reference the dna table. However there isn't a dna sequence for every exon. For example, the exon with exon_id=28550800, it's corresponding seq_region_id does not exist in the dna table. This is my first time using Ensembl, so is there something I'm missing?

dna exon ensembl • 3.0k views
ADD COMMENT
0
Entering edit mode

Is there a reason you're not just using biomart (that's a query for the exonic sequences of each annotated human exon from release 76)?

ADD REPLY
0
Entering edit mode

So your approach of using biomart will work. It still doesn't solve my problem of how the seq_region_id from the exon table maps to the seq_region_id in the dna table. Although they have the same name, they aren't the same in the database. Just did a sql join between the dna table and exon table based on seq_region_id and it shows that there is no relation between the two tables.

ADD REPLY
1
Entering edit mode
10.2 years ago
Emily 24k

Magali answered this on the Ensembl dev list as follows:

Exons and other features tend to be stored on toplevel sequences, which are generally chromosomes. Dna sequence however is stored on the contig level. The assembly table contains information to map a contig sequence to a chromosome.

Retrieving dna sequence directly from the mysql schema is tricky in the best of case. This is why we recommend using Biomart, the perl API or REST queries for this type of use.

ADD COMMENT
0
Entering edit mode
9.9 years ago
Tariq Daouda ▴ 220

Hi,

I wrote a python module for this kind of queries on Ensembl data, it's called pyGeno and it is freely available on github: https://github.com/tariqdaouda/pyGeno

Once you've imported the genome into it you can simply do:

from pyGeno.Genome import *

ref = Genome(name = "GRCh7.75")
exon = ref.get(Exon, id = "EN...")[0]

print exon.CDS
print exon.sequence

Hope that helps

ADD COMMENT

Login before adding your answer.

Traffic: 2534 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6