Ensembl & biomaRt: extracting in-frame codons near specified position
2
0
Entering edit mode
9.5 years ago
bsmith030465 ▴ 240

Hi,

My objective is to find the frequency of in-frame codons within 50 bp of a specified location. I have the ensembl transcript ID and the genomic coordinates, e.g.:

chromosome    position    strand    ensemblTID
chr2    219130603    -    ENST00000538028

I have been trying to use the getBM and getSequence functions, but I'm not even close to getting what I want.

Any help to point me in the correct direction would be appreciated!

Thanks!

ensembl codon biomart • 3.0k views
ADD COMMENT
1
Entering edit mode
9.5 years ago
Tariq Daouda ▴ 220

A very similar example of what you need is on the front page of pyGeno's website.

Here's a way to get what you're asking for:

from pyGeno.Genome import *

ref = Genome(name = "GRCh37.75") #or whatever other ref genome you've chosen to import
exons = ref.get(Exon, {"chromosome.number" : "2", "start >=": 219130603 - 50, "end <=" : 219130603 + 50 } )

#to print the sequences for example do:
for e in exons :
  print exon.sequence
ADD COMMENT
0
Entering edit mode

Hi Tariq,

That looks like an interesting package. I tried your code and got the following error:

Traceback (most recent call last):
  File "/Applications/Wing101.app/Contents/Resources/src/debug/tserver/_sandbox.py", line 7, in <module>
  File "/Library/Python/2.7/site-packages/pyGeno/Genome.py", line 67, in __init__
    pyGenoRabaObjectWrapper.__init__(self, *args, **kwargs)
  File "/Library/Python/2.7/site-packages/pyGeno/pyGenoObjectBases.py", line 83, in __init__
    self.wrapped_object = self._wrapped_class(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/rabaDB/Raba.py", line 301, in __call__
    raise KeyError("Couldn't find any object that fit the arguments you've prodided to the constructor")
KeyError: "Couldn't find any object that fit the arguments you've prodided to the constructor"

Also, I'm not sure if this is quite what I am looking for. I think that using biomaRt and biostrings I can get the sequence information (given the coordinates).

ADD REPLY
0
Entering edit mode
9.5 years ago
Emily 24k

I don't think you're going to get the data you need through BioMart. BioMart is a gene-centric tool, it works by defining a list of genes (filters) then printing information about those genes (attributes). Also, BioMart attributes tend to be things that are commonly looked-for by the community, not very esoteric things.

What you're trying to do is define a locus and get very specific data on the genomic region around it - BioMart's just not going to do it. I think you're going to have to look at using the Ensembl Perl API. This allows completely flexible access to the Ensembl database, so you can define your regions of interest and get whatever data you like for them. There's an online course on using the API here, you'll just need the Core module.

ADD COMMENT
0
Entering edit mode

Hi Emily,

Thanks for the reply! Hmm...I don't know if my question is too esoteric! I think with a little processing, I may be able to get the answer, but I may be wrong! Anyway, here's how I view the problem:

  1. Given the ensembl transcript ID, identify the coding start and coding end coordinates.
  2. Given the coding coordinates, get the dna sequence
  3. Convert dna sequence to mRNA sequence.
  4. From the 5' end, identify where the first codon starts.
  5. Given the coordinates of the first codon, keep moving down until you hit the region of interest (location plus/minus 50 bp)
  6. Identify codons in this region.

Am I thinking about this correctly? Did I pose the question correctly?

Many thanks for all your help!

ADD REPLY
0
Entering edit mode

It's too esoteric for BioMart to be able to get you your answer all by itself, but if you're happy to do post-processing, that's fine. You might find it easier to get the exon attributes, since introns will mess with your coding frame. You can get the exon sequence, phase and coding start/end and this might be easier to work with.

ADD REPLY

Login before adding your answer.

Traffic: 2166 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6