Hi everyone,
I was wondering if anyone if familiar of any annotation term in the human genes annotation e.g. from gencode or ensembl to be able to extract pseudogenes and separate them from non-pseudogenes.
Any thoughts?
Thanks in advance, Sergio
Hi everyone,
I was wondering if anyone if familiar of any annotation term in the human genes annotation e.g. from gencode or ensembl to be able to extract pseudogenes and separate them from non-pseudogenes.
Any thoughts?
Thanks in advance, Sergio
You can use Ensembl BioMart with the following query:
very long BioMart query..... Modify parameters as you like
There are many sub-types of pseudo genes, the query outputs the gene type in the last column.
The link will lead you to a preset encompassing all types of pseudogenes by using a Filter setting for "gene type" and selecting all types that contain "pseudogene", like "translated_processed_pseudogene,translated_unprocessed_pseudogene, etc.". This link is meant as a starting point. You can adjust the filter criteria to restrict the results to different subsets of pseudogenes or modify the attributes to extract different data columns or sequences. It is best to simply try it out.
All settings and filters are encoded in the URL and correctly applied by BioMart. However, it seems that there is a bug that prevents the filter settings encoded in the URL to be displayed correctly in the web-interface under "Filters". This behavior wasn't there when I posted this answer. If you check the results, they are correct anyway and contain only *pseudogene.
To change filter settings, click on Filter (to the left) -> check "Gene types" -> and select all gene types that you wish to include
Hello,
You can also use the Table Browser (http://genome.ucsc.edu/cgi-bin/hgTables) to extract the pseudogenes from our default gene track.
First make the following selections (for hg38):
And then select the filter button, and type pseudogene for transcriptClass:
The output for the whole genome will be 18,578 annotations, from the GENCODE V36 models.
If you have any follow up questions, our public help desk can always be reached at genome@soe.ucsc.edu. You may also send questions to genome-www@soe.ucsc.edu if they contain sensitive data. For any Genome Browser questions on Biostars, the UCSC tag is the best way to ensure visibility by the team.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
GENCODE contains
gene_type
which you can query forpseudogene
.Is it ok with an Ensembl gff?