Question

Find 'true' gene for a pseudogene?

0

Entering edit mode

8.7 years ago

biocyberman ▴ 870

Searching pseudogenes for a gene is already discussed here: How To Find Pseudogenes Of A Given Protein?. I want to the reverse. I am using BLAST to search the true gene for a pseudogene. For example, the closest match (96%) to the follwowing gene is Mup4

http://www.ensembl.org/Rattus_norvegicus/Gene/Summary?db=core;g=ENSRNOG00000051074;r=5:77457826-77457936;t=ENSRNOT00000076471

Is there any quicker method to do it, i.e. batch query with biomart?

I am thinking maybe because ENSRNOT00000076471 is marked as "Known unprocessed pseudogene" so it can not be found with bioMart. Or am I misunderstanding the word 'unprocessed' there?

Update:

Just a comment for myself: The BLAST-based method is not reliable, becuase it only report the local match and its % identity.

ensembl biomart • 2.2k views

ADD COMMENT • link updated 8.7 years ago by Jean-Karim Heriche 27k • written 8.7 years ago by biocyberman ▴ 870

score 0 · Answer 1 · 2016-02-29

0

Entering edit mode

8.7 years ago

Jean-Karim Heriche 27k

Once you have a list of candidates, you can look up their status with the Ensembl API: $gene->biotype returns protein_coding for coding genes. Emily's answer (and code) to the question you linked to should get you started.

ADD COMMENT • link 8.7 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Are you saying that I can generate a list/dictionary of TrueGene => Pseudogenes following the code and then filter the list? I am trying to do it in biomaRt (R package), but I don't know which attributes to include. Here is the snippet.

    libary(biomaRt)
    ensembl = useMart("ensembl",dataset="rnorvegicus_gene_ensembl")
        atrs <- c("ensembl_gene_id", "ensembl_transcript_id", "external_gene_name",
                  "rgd_symbol", "transcript_biotype")
        rs <- getBM(attributes = atrs, mart = ensembl)
        head(subset(rs, subset = grepl("pseudogene", transcript_biotype)))
        nrow(subset(rs, subset = grepl("pseudogene", transcript_biotype)))
        rss <- subset(rs, subset = grepl("pseudogene", transcript_biotype))

ADD REPLY • link 8.7 years ago by biocyberman ▴ 870

1

Entering edit mode

I don't know if the Ensembl biomart database gives you this kind of information and I am not familiar with biomaRt because I use the Ensembl perl API for this kind of things. What I meant is that once you've collected a list of candidates with blast or other means, you could extract those genes that are protein-coding. You could use Emily's code from the other post as a starting point to look for protein-coding genes in your list instead of pseudogenes.

ADD REPLY • link 8.7 years ago by Jean-Karim Heriche 27k