Fetching Gene List For A Protein Family
3
7
Entering edit mode
14.3 years ago
Ananth ▴ 90

I would like to find the number of genes present in a organism coding for a particular protein for example: human argonaute protein has 4 genes (AGO1, AGO2, AGO3, AGO4)

I would like to fetch this kind of information for other organisms. What should be the search strategy for this?

protein • 3.8k views
ADD COMMENT
4
Entering edit mode
13.0 years ago

EnsEMBL Perl APIs provide functions to retrieve this sort of information very simply. In general see the Compara API tutorial, specifically the Family Objects section.

An example for your gene in question would be as follows:

#!/usr/bin/env perl

# setup module imports
use strict;
use warnings;

use Bio::EnsEMBL::Registry;
use Data::Dumper;
use Time::HiRes qw(gettimeofday);

# start time
my $start_time = gettimeofday;

# setup registry and connect to database
my $registry = 'Bio::EnsEMBL::Registry';
$registry->load_registry_from_db(   -host => 'ensembldb.ensembl.org',
                    -port => 5306,
                    -user => 'anonymous',
                    -pass => undef);

# setup family adaptor
my $family_adaptor = Bio::EnsEMBL::Registry->get_adaptor("Multi","Compara","Family");

print "Retrieving argonaute protein family...\n";
# get all protein families from EnsEMBL
my $families = $family_adaptor->fetch_by_description_with_wildcards("ARGONAUTE EUKARYOTIC TRANSLATION INITIATION FACTOR", 1);
print "Retrieved " . scalar(@{$families}) . " protein families.\n\n";

# we expect 1 family
if (scalar(@{$families}) != 1) {
    die("We expect only 1 protein family.\n");
}

# traverse families and get attributes
print "Getting all member genes for protein family...\n";
my $done = 0;
while (my $family = shift @{$families}) {
    print join("\t", map { $family->$_ }  qw(stable_id description description_score))."\n\n";

    # traverse family members
    while (my $member_attribute = shift @{$family->get_Member_Attribute_by_source("ENSEMBLGENE")}) {
        my ($member, $attribute) = @{$member_attribute};

        my $taxon = $member->taxon;
        print $member->stable_id, " ", $taxon->name, "\n";
        $done++;
    }
}
print "...processed $done members!\n\n";

# end time
my $end_time = gettimeofday;

# total time
my $total_time = ($end_time - $start_time);
print "Finished in $total_time seconds\n";

This will return a list of gene IDs and taxonomic binomials for all 209 members of this protein family, which matches what we would expect here.

Hope this helps?

ADD COMMENT
1
Entering edit mode

New methods, new approaches, new data, new (favorite) coding language - all make a new answer to an old question relevant.

ADD REPLY
0
Entering edit mode

Lol, just noticed this was ask a year ago :S Thought it was August this year!

ADD REPLY
0
Entering edit mode

Lol, just noticed this was asked a year ago :S Thought it was August this year

ADD REPLY
3
Entering edit mode
14.3 years ago

The answers to this previous question on subunits of enzymes might help you, where complex databases like CORUM were suggested. You could also use a database of orthologous proteins like eggNOG (AGO family = KOG1041?), if this matches your idea of a protein family.

ADD COMMENT
0
Entering edit mode

The information I am looking for is the number of genes or gene names that code for a particular protein family in an organism

Ex. I would like to know how many genes are coding for Human Argonaute protein subfamily...

ADD REPLY
0
Entering edit mode

Actually eggNOG will give orthologous genes in different organisms...where as I need the genes coding for the same protein family in the same organism...

ADD REPLY
0
Entering edit mode

eggNOG will give you both - you just need to filter down the results to the organism that you are interested in.

ADD REPLY
2
Entering edit mode
14.3 years ago

Exactly as Michael Kuhn states - if this is your idea of a protein family. In order to best answer this question, you need to define up front what your definition of a protein or gene family is. For example, are all cytochrome P450 genes in one family, or do you classify according to sub-type CYP1s distinct from CYP2s from CYP4s, etc. And this depends on what you will do with these protein/gene families? One also needs to decide if these are gene families or protein families because one gene can encode more than one protein where one of those isoforms may be quite a bit different in length and function from the other isoforms.

ADD COMMENT

Login before adding your answer.

Traffic: 1811 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6