I need to predict pseudogenes in an eukaryotic species, I have the genome sequences. The gene prediction have been done already.
I have learned that there are two programs named Pseudopipe (paper) and PPFINDER. However, the Pseudopipe need some information from Ensemble mysql database. The question is that I have not use Ensemble pipeline to predict genes. I have problems in use the PPFINDER too, which need synteny information with another species.
I want to that if there exists any convenient way to predict pseudogene. Thank you!
I have never used any of these programs but some simple methods can help you predict some pseudogenes. Pseudogenes emerge from protein-coding genes. Then you can scan your genome for sequences homologous to the known protein-coding genes it contains (basically looking for non annotated paralogous sequences). This can be done using tools such as BLAST, BLAT,...
Once this is done you have to see if these sequences have kept their protein-coding abilities and, typically, look for frameshifts or ORF disruptions which can be due to mutations of start/stop codons or indels. A sequence homologous to a protein-coding gene but without ORF is likely to be a pseudogene.
You can then extend your analysis including genes from other species. A pseudogene can be a single copy gene (without paralogs) that has been lost in a given species. The method is a same except that you look at orthologous rather than paralogous sequences and then scan your genome of interest for potential sequences homologous to protein coding genes present in more or less closely related species. An addition to this method is to look for synteny. If the sequence is quite degenerated but the flanking regions (ideally containing several protein-coding genes) are conserved this will give more confidence to you detection.
A good confirmation of pseudogenes is to look at dN/dS of this sequences. Pseudogenes generally evolve under neutral selection and then display a dN/dS close to 1. Nonetheless this might be affected by the time this gene has been decaying.
I let you a reference of a paper which use a partially similar method (it is a bit more complex). It is focusing on only one gene but the approach is usable at a large scale.
+1. A comment is I found building a nucleotide phylogenetic tree is quite informative. Pseudogenes tend to break molecular clocks and stay on long branches. Tree-based method seems to have higher power than dN/dS.
This is true for processed (retrotransposed) pseudogenes which are mainly intronless but, to my knowledge, exon-intron structures are conserved when the duplication occured by segmental duplications even though the pressure to conserve them is less strong. If you have any reference to share about this I would be interested in reading it.
And, also, retrotransposition does occur in all organisms, this might then depend on which species you are working on.
i have 6 pseudogene from my defined gene but one of these pseudogenes is marvelous and i want to know how this pseudogene is produced?
would you guide me about this please?
thank you in advance
Also, looking at RNASeq data might be helpful. You can find out if a gene is transcribed or not - at least in the tissue of your sample!!! It is also useful since you can find if the annotation you are using is accurate or not.
Thank You very much! I understand what I should do now.
+1. A comment is I found building a nucleotide phylogenetic tree is quite informative. Pseudogenes tend to break molecular clocks and stay on long branches. Tree-based method seems to have higher power than dN/dS.
There is no such thing a "neutral selection". I think what you mean to say is "Pseudogenes generally evolve under no selective constraint"
There is no such thing as "neutral selection". I think what you mean to say is "Pseudogenes generally evolve under no selective constraint"