My question is rather trivial,
Is there a resource which I can use to find pseudogenes of my favorite set of genes using only identifiers (not to have to use sequence comparison)?
Best regards
My question is rather trivial,
Is there a resource which I can use to find pseudogenes of my favorite set of genes using only identifiers (not to have to use sequence comparison)?
Best regards
One option may be to use the Ensembl API. You could write a Perl script that searched for your gene within the database, then identified all the transcripts of the gene and selected those with the biotype pseudogene.
There are instructions on downloading the API here:
http://www.ensembl.org/info/docs/api/api_installation.html
There's a tutorial on using the API here:
http://www.ensembl.org/info/docs/api/core/core_tutorial.html
The documentation is here:
http://www.ensembl.org/info/docs/Doxygen/core-api/index.html
Let me know if you need any help with this.
What about pseudogene.org.
Welcome to Pseudogene.org. The site is developed and maintained by Yale Gerstein Group. This site contains a comprehensive database of identified pseudogenes, utilities used to find pseudogenes, various publication data sets and a pseudogene knowledgebase.
You can download the gene list or per chromosome list in csv/gtf format and then can cross-query with your custom list, using R or perl/python.
15017 / 62252 genes in current release of Ensembl Genes 71/GRCh37.p10 release are annotated with a biotype related to pseudogene.
Gene biotype related to pseudogene: IG_C_pseudogene,IG_J_pseudogene, IG_V_pseudogene, polymorphic_pseudogene, processed_pseudogene, pseudogene,
TR_J_pseudogene and TR_V_pseudogene
You can filter the required gene/transcript biotype using gene / protein ID using BioMart easily.
Here is a screenshot based on my query:
Answering the revised question - Pseudogenes of OPTN:
Yes you can query BioMart using gene symbols and check if any gene biotype or transcript biotype belongs to a pseudogene category.
For your specific gene OPTN, as per Ensembl Genes 71/GRCh37.p10 release there is no pseudogene encoded by any of its gene/transcripts.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
While some pseudogenes are transcribed, most are not. I'm not sure where that leaves the Ensembl transcripts. HGNC curate some psuedogenes but not many. You can cluster hypothetical ORFs but as most should have frameshifts, stops or other transcription/translation breakers that wont be easy
This script works:
Edit to put in different genes, change what you print out etc. Also, check the possible biotypes (which, as Khader says, you can find in BioMart) and add any more that you think are relevant to your search.