Searching pseudogenes for a gene is already discussed here: How To Find Pseudogenes Of A Given Protein?. I want to the reverse. I am using BLAST to search the true gene for a pseudogene. For example, the closest match (96%) to the follwowing gene is Mup4
Is there any quicker method to do it, i.e. batch query with biomart?
I am thinking maybe because ENSRNOT00000076471 is marked as "Known unprocessed pseudogene" so it can not be found with bioMart. Or am I misunderstanding the word 'unprocessed' there?
Update:
Just a comment for myself: The BLAST-based method is not reliable, becuase it only report the local match and its % identity.
Are you saying that I can generate a list/dictionary of TrueGene => Pseudogenes following the code and then filter the list? I am trying to do it in biomaRt (R package), but I don't know which attributes to include. Here is the snippet.
I don't know if the Ensembl biomart database gives you this kind of information and I am not familiar with biomaRt because I use the Ensembl perl API for this kind of things. What I meant is that once you've collected a list of candidates with blast or other means, you could extract those genes that are protein-coding. You could use Emily's code from the other post as a starting point to look for protein-coding genes in your list instead of pseudogenes.