Question

Pseudogenes in the human genome annotation

1

Entering edit mode

4.1 years ago

Sergio Martínez Cuesta ▴ 230

Hi everyone,

I was wondering if anyone if familiar of any annotation term in the human genes annotation e.g. from gencode or ensembl to be able to extract pseudogenes and separate them from non-pseudogenes.

Any thoughts?

Thanks in advance, Sergio

human genome GRCh38 hg38 annotation • 2.9k views

ADD COMMENT • link updated 3.2 years ago by Michael 55k • written 4.1 years ago by Sergio Martínez Cuesta ▴ 230

2

Entering edit mode

GENCODE contains gene_type which you can query for pseudogene.

ADD REPLY • link 4.1 years ago by ATpoint 86k

1

Entering edit mode

Is it ok with an Ensembl gff?

ADD REPLY • link 4.1 years ago by Shred ★ 1.6k

1

Entering edit mode

3.2 years ago

Luis Nassar ▴ 670

Hello,

You can also use the Table Browser (http://genome.ucsc.edu/cgi-bin/hgTables) to extract the pseudogenes from our default gene track.

First make the following selections (for hg38):

Table Browser selection

And then select the filter button, and type pseudogene for transcriptClass:

filter for pseudogene

The output for the whole genome will be 18,578 annotations, from the GENCODE V36 models.

If you have any follow up questions, our public help desk can always be reached at genome@soe.ucsc.edu. You may also send questions to genome-www@soe.ucsc.edu if they contain sensitive data. For any Genome Browser questions on Biostars, the UCSC tag is the best way to ensure visibility by the team.

ADD COMMENT • link 3.2 years ago by Luis Nassar ▴ 670

score 2 · Accepted Answer · 2020-11-14

2

Entering edit mode

4.1 years ago

Michael 55k

You can use Ensembl BioMart with the following query:

very long BioMart query..... Modify parameters as you like

There are many sub-types of pseudo genes, the query outputs the gene type in the last column.

ADD COMMENT • link 3.2 years ago by Michael 55k

1

Entering edit mode

Thank you for the link! Do you mind to explain the query a bit more? I do not understand how modifying only the attributes, one gets the list of pseudogenes.

ADD REPLY • link 3.2 years ago by linmarnor ▴ 10

0

Entering edit mode

The link will lead you to a preset encompassing all types of pseudogenes by using a Filter setting for "gene type" and selecting all types that contain "pseudogene", like "translated_processed_pseudogene,translated_unprocessed_pseudogene, etc.". This link is meant as a starting point. You can adjust the filter criteria to restrict the results to different subsets of pseudogenes or modify the attributes to extract different data columns or sequences. It is best to simply try it out.

All settings and filters are encoded in the URL and correctly applied by BioMart. However, it seems that there is a bug that prevents the filter settings encoded in the URL to be displayed correctly in the web-interface under "Filters". This behavior wasn't there when I posted this answer. If you check the results, they are correct anyway and contain only *pseudogene.

To change filter settings, click on Filter (to the left) -> check "Gene types" -> and select all gene types that you wish to include

ADD REPLY • link 3.2 years ago by Michael 55k