Find 'true' gene for a pseudogene?
1
0
Entering edit mode
8.8 years ago
biocyberman ▴ 870

Searching pseudogenes for a gene is already discussed here: How To Find Pseudogenes Of A Given Protein?. I want to the reverse. I am using BLAST to search the true gene for a pseudogene. For example, the closest match (96%) to the follwowing gene is Mup4

http://www.ensembl.org/Rattus_norvegicus/Gene/Summary?db=core;g=ENSRNOG00000051074;r=5:77457826-77457936;t=ENSRNOT00000076471

Is there any quicker method to do it, i.e. batch query with biomart?

I am thinking maybe because ENSRNOT00000076471 is marked as "Known unprocessed pseudogene" so it can not be found with bioMart. Or am I misunderstanding the word 'unprocessed' there?

Update:

Just a comment for myself: The BLAST-based method is not reliable, becuase it only report the local match and its % identity.

ensembl biomart • 2.2k views
ADD COMMENT
0
Entering edit mode
8.8 years ago

Once you have a list of candidates, you can look up their status with the Ensembl API: $gene->biotype returns protein_coding for coding genes. Emily's answer (and code) to the question you linked to should get you started.

ADD COMMENT
0
Entering edit mode

Are you saying that I can generate a list/dictionary of TrueGene => Pseudogenes following the code and then filter the list? I am trying to do it in biomaRt (R package), but I don't know which attributes to include. Here is the snippet.

    libary(biomaRt)
    ensembl = useMart("ensembl",dataset="rnorvegicus_gene_ensembl")
        atrs <- c("ensembl_gene_id", "ensembl_transcript_id", "external_gene_name",
                  "rgd_symbol", "transcript_biotype")
        rs <- getBM(attributes = atrs, mart = ensembl)
        head(subset(rs, subset = grepl("pseudogene", transcript_biotype)))
        nrow(subset(rs, subset = grepl("pseudogene", transcript_biotype)))
        rss <- subset(rs, subset = grepl("pseudogene", transcript_biotype))
ADD REPLY
1
Entering edit mode

I don't know if the Ensembl biomart database gives you this kind of information and I am not familiar with biomaRt because I use the Ensembl perl API for this kind of things. What I meant is that once you've collected a list of candidates with blast or other means, you could extract those genes that are protein-coding. You could use Emily's code from the other post as a starting point to look for protein-coding genes in your list instead of pseudogenes.

ADD REPLY

Login before adding your answer.

Traffic: 2207 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6