Finding Homologous Exons Between Species Using Pycogent
1
1
Entering edit mode
12.7 years ago
User 9334 ▴ 60

i have a set of exon coordinates in human that i want to find the orthologs of in mouse, and vice versa, using pycogent. my reading of the docs suggests that i can just find the syntenic region for each exon in the alignment of human and mouse and that this should yield the orthologous exon coordinates. i tried this strategy like this:

compara = Compara(["human", "mouse"], Release=63, account=account)
regions = compara.getSyntenicRegions(CoordName="1", Start=4775654, End=4775821, align_method="PECAN", align_clade="vertebrate")
for my_region in regions: print my_region

This yields the error:

  File "/usr/local/lib/python2.6/dist-packages/cogent-1.5.1-py2.6-linux-x86_64.egg/cogent/db/ensembl/compara.py", line 344, in getSyntenicRegions
    ref_genome = self._genomes[_Species.getSpeciesName(Species)]
KeyError: 'None'

using python 2.6 and pycogent 1.5.1. any ideas what might be wrong? what's the easiest way to do this using pycogent? thank you.

edit: the solution is to pass Species="mouse" or Species="human" to getSyntenicRegions(). This works but it is far too slow.... are there better ways to do this efficiently?

python ensembl comparative • 3.1k views
ADD COMMENT
0
Entering edit mode

Have you considered using EnsEMBL's Perl API?

ADD REPLY
0
Entering edit mode

One way to speed things up (as I do) is to download all the EnsEMBL data and have your own local copy of their MySQL server!?

ADD REPLY
0
Entering edit mode
12.7 years ago
Biojl ★ 1.7k

Hi,

I tried pycogent for a while but It's major drawback, as you may have noticed is that is extremely slow. Also, when you want to process lots of information the connection get severed after a few genes.

My recommendation is to download all the data from ensembl biomart and then create your own script to make the comparisons.

ADD COMMENT
0
Entering edit mode

the interface to ensembl makes this information extremely difficult to find. where can i download the ensembl compara from, without downloading all of ensembl? if i want to get all the mouse / human alignments or syntenic regions for example, where can this be downloaded through biomart? thank you

ADD REPLY
0
Entering edit mode

I think that you won't be able to access this information directly, neither do Biomart. My guess is that first it assigns a gene to your exon coordinates, then searches for it's mouse ortholog and searches for it's exons to make the comparison. You could do the same with a script. In attributes>sequences you'll find gene and exon positions as attributes to download.

ADD REPLY
0
Entering edit mode

At some point in the future, I'd like to look at improving the PyCogent MySQL code. It runs using SQLAlchemy, which has been shown to be quite slow in various benchmarking studies. This isn't high on my list however :(

ADD REPLY

Login before adding your answer.

Traffic: 1628 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6