Hi, I used glimmer to prediction orfs in a draft genome, and I found that some orf had a insertion or deletion with its homologous genes in other genomes. But I used these homologous genes to search the draft genome sequence and got a complete match with total length. So, the predicted orf is wrong? or the predicted needs to be tuned again. how to get a credible gene prediction results? Thank you for your reply.
Need more details: How did you do the search, which blast program (blastp, tblastx, DNA, AA database), is it a prokaryote?
It might be that there are multiple copies of your test set of genes in your draft genome and one has a frame-shift. Use tblastx to detect those. (also glimmer doesn't "predict ORFs" ORFs do not need to be predicted, it predicts whether ORFs are protein-coding or not)
I use the gene sequence to search the genome sequence(blastn), and the genome is a prokaryotic genome. there are no copies of the test genes.The position of orf is loacated in the region of alignment in the genome. If I use the protein sequence to search the genome and get a local alignment result, can I conclude that this genome has the protein-coding gene? Thank you!
So, when you blastn DNA sequence of gene A (draft) against B, you get an insertion or deletion but when you blast B against A with the same parameters you do not get any with the same coordinates? Is that what you are trying to tell? That would worry me slightly, however it is most likely not true. could you post an example?
there is a complete genome A with a gene a(100bp); there is a draft genome B with a predicted orf b(90bp); when i align gene a with orf b, there is a 10bp deletion in the N terminus of orf b; ('*' mens match, and '-' means deletion)
when I use gene a to search the genome B, there exists an alignment with total length.
I think you have been tricked by the alignment heuristic, that somehow doesn't score the more complete alignment better than the incomplete one. You can use SSearch or EMBOS water if in doubt, but I wouldn't be worried.