Question

Finding Homologous Exons Between Species Using Pycogent

1

Entering edit mode

13.2 years ago

User 9334 ▴ 60

i have a set of exon coordinates in human that i want to find the orthologs of in mouse, and vice versa, using pycogent. my reading of the docs suggests that i can just find the syntenic region for each exon in the alignment of human and mouse and that this should yield the orthologous exon coordinates. i tried this strategy like this:

compara = Compara(["human", "mouse"], Release=63, account=account)
regions = compara.getSyntenicRegions(CoordName="1", Start=4775654, End=4775821, align_method="PECAN", align_clade="vertebrate")
for my_region in regions: print my_region

This yields the error:

  File "/usr/local/lib/python2.6/dist-packages/cogent-1.5.1-py2.6-linux-x86_64.egg/cogent/db/ensembl/compara.py", line 344, in getSyntenicRegions
    ref_genome = self._genomes[_Species.getSpeciesName(Species)]
KeyError: 'None'

using python 2.6 and pycogent 1.5.1. any ideas what might be wrong? what's the easiest way to do this using pycogent? thank you.

edit: the solution is to pass Species="mouse" or Species="human" to getSyntenicRegions(). This works but it is far too slow.... are there better ways to do this efficiently?

python ensembl comparative • 3.4k views

ADD COMMENT • link updated 13.2 years ago by Biojl ★ 1.7k • written 13.2 years ago by User 9334 ▴ 60

0

Entering edit mode

Have you considered using EnsEMBL's Perl API?

ADD REPLY • link 13.0 years ago by Steve Moss 2.3k

0

Entering edit mode

One way to speed things up (as I do) is to download all the EnsEMBL data and have your own local copy of their MySQL server!?

ADD REPLY • link 13.0 years ago by Steve Moss 2.3k

score 0 · Answer 1 · 2012-02-27

0

Entering edit mode

13.2 years ago

Biojl ★ 1.7k

Hi,

I tried pycogent for a while but It's major drawback, as you may have noticed is that is extremely slow. Also, when you want to process lots of information the connection get severed after a few genes.

My recommendation is to download all the data from ensembl biomart and then create your own script to make the comparisons.

ADD COMMENT • link 13.2 years ago by Biojl ★ 1.7k

0

Entering edit mode

the interface to ensembl makes this information extremely difficult to find. where can i download the ensembl compara from, without downloading all of ensembl? if i want to get all the mouse / human alignments or syntenic regions for example, where can this be downloaded through biomart? thank you

ADD REPLY • link 13.2 years ago by User 9334 ▴ 60

0

Entering edit mode

I think that you won't be able to access this information directly, neither do Biomart. My guess is that first it assigns a gene to your exon coordinates, then searches for it's mouse ortholog and searches for it's exons to make the comparison. You could do the same with a script. In attributes>sequences you'll find gene and exon positions as attributes to download.

ADD REPLY • link 13.2 years ago by Biojl ★ 1.7k

0

Entering edit mode

At some point in the future, I'd like to look at improving the PyCogent MySQL code. It runs using SQLAlchemy, which has been shown to be quite slow in various benchmarking studies. This isn't high on my list however :(

ADD REPLY • link 13.0 years ago by Steve Moss 2.3k