How To Find Pseudogenes Of A Given Protein?
3
2
Entering edit mode
11.6 years ago
Frenkiboy ▴ 260

My question is rather trivial,

Is there a resource which I can use to find pseudogenes of my favorite set of genes using only identifiers (not to have to use sequence comparison)?

Best regards

• 8.6k views
ADD COMMENT
3
Entering edit mode
11.6 years ago
Emily 24k

One option may be to use the Ensembl API. You could write a Perl script that searched for your gene within the database, then identified all the transcripts of the gene and selected those with the biotype pseudogene.

There are instructions on downloading the API here:

http://www.ensembl.org/info/docs/api/api_installation.html

There's a tutorial on using the API here:

http://www.ensembl.org/info/docs/api/core/core_tutorial.html

The documentation is here:

http://www.ensembl.org/info/docs/Doxygen/core-api/index.html

Let me know if you need any help with this.

ADD COMMENT
1
Entering edit mode

While some pseudogenes are transcribed, most are not. I'm not sure where that leaves the Ensembl transcripts. HGNC curate some psuedogenes but not many. You can cluster hypothetical ORFs but as most should have frameshifts, stops or other transcription/translation breakers that wont be easy

ADD REPLY
0
Entering edit mode

This script works:

#!/usr/bin/perl
use strict;
use warnings;
use Bio::EnsEMBL::Registry;

my $registry = "Bio::EnsEMBL::Registry";

$registry->load_registry_from_db(
   -host => 'ensembldb.ensembl.org',
   -user => 'anonymous'
);

my $gene_adaptor = $registry -> get_adaptor('Mouse', 'Core', 'Gene');

my $gene_name = "Optn";

my @genes = @{ $gene_adaptor->fetch_all_by_external_name($gene_name) };

while (my $gene = shift @genes){

    print $gene->external_name, ", ", $gene->stable_id, "\n";

    my @transcripts = @{ $gene->get_all_Transcripts };

    while (my $transcript = shift @transcripts) {

        if ($transcript->biotype eq "processed_pseudogene"
        or $transcript->biotype eq "IG_C_pseudogene"
        or $transcript->biotype eq "IG_J_pseudogene"
        or $transcript->biotype eq "IG_V_pseudogene"
        or $transcript->biotype eq "polymorphic_pseudogene"
        or $transcript->biotype eq "pseudogene"
        or $transcript->biotype eq "unprocessed_pseudogene"
        or $transcript->biotype eq "TR_J_pseudogene"
        or $transcript->biotype eq "TR_V_pseudogene"
        ) {
        print $transcript->stable_id, "\n";
        }
    }
}

Edit to put in different genes, change what you print out etc. Also, check the possible biotypes (which, as Khader says, you can find in BioMart) and add any more that you think are relevant to your search.

ADD REPLY
1
Entering edit mode
11.6 years ago

What about pseudogene.org.

Welcome to Pseudogene.org. The site is developed and maintained by Yale Gerstein Group. This site contains a comprehensive database of identified pseudogenes, utilities used to find pseudogenes, various publication data sets and a pseudogene knowledgebase.

You can download the gene list or per chromosome list in csv/gtf format and then can cross-query with your custom list, using R or perl/python.

ADD COMMENT
0
Entering edit mode

pseudogene.org has outdated annotations - corresponding to the May 2004. genome built (unless I couldn't find the recent ones, which is highly plausible), and it requires 2 liftovers to get the coordinates to mm9 - mm10

ADD REPLY
1
Entering edit mode
11.6 years ago

15017 / 62252 genes in current release of Ensembl Genes 71/GRCh37.p10 release are annotated with a biotype related to pseudogene.

Gene biotype related to pseudogene: IG_C_pseudogene,IG_J_pseudogene, IG_V_pseudogene, polymorphic_pseudogene, processed_pseudogene, pseudogene, TR_J_pseudogene and TR_V_pseudogene

You can filter the required gene/transcript biotype using gene / protein ID using BioMart easily.

Here is a screenshot based on my query: enter image description here

Answering the revised question - Pseudogenes of OPTN:

Yes you can query BioMart using gene symbols and check if any gene biotype or transcript biotype belongs to a pseudogene category.

For your specific gene OPTN, as per Ensembl Genes 71/GRCh37.p10 release there is no pseudogene encoded by any of its gene/transcripts.

enter image description here

ADD COMMENT
0
Entering edit mode

Dear Khader, thank you so much for your answer, but I am already aware that I can find a list of all of the pseudogenes in a certain genome.

Let me rephrase my question: given a gene (say Optn), is there an easy way to find its related pseudogenes?

ADD REPLY

Login before adding your answer.

Traffic: 2740 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6