I am trying to calculate the number of paralogs for a few genes in different species in Ensembl. I am wondering if there is any tool which can do it automatically. Thank you.
I am trying to calculate the number of paralogs for a few genes in different species in Ensembl. I am wondering if there is any tool which can do it automatically. Thank you.
As from your question it looks you like to use Python for this, you could also opt to use the Ensembl REST API.
It's easy to get all orthologues for a gene, for example for the human ABCD1 gene:
or all paralogues for the same gene:
So, using these REST statements, I think it should be quite easy for you to start out with a particular gene in e.g. human, find the orthologues in the other Ensembl species and then get the number of paralogues for those genes.
How you can use REST statements in Python code you can find in the REST documentation for the statement in question.
Hope this helps.
Have you tried ensembl compara? I think they've used a pipeline to construct gene trees for gene families and called ortholog and paralog. You may use ensembl API to access those information.
http://www.ensembl.org/info/genome/compara/homology_method.html
http://www.ensembl.org/info/docs/api/compara/compara_tutorial.html
Ensembl REST is very useful but I think, it needs a little parsing to count the paralogs.
Quickly I tried this:
cut -d " " -f 1 biomart_results.txt | sort | uniq -c
Limitation: If you have more than 500 ID's, You should run it multiple times.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.