A Doubt About Gene Prediction
1
0
Entering edit mode
11.2 years ago
Ontheway ▴ 10

Hi, I used glimmer to prediction orfs in a draft genome, and I found that some orf had a insertion or deletion with its homologous genes in other genomes. But I used these homologous genes to search the draft genome sequence and got a complete match with total length. So, the predicted orf is wrong? or the predicted needs to be tuned again. how to get a credible gene prediction results? Thank you for your reply.

gene • 2.8k views
ADD COMMENT
2
Entering edit mode

Need more details: How did you do the search, which blast program (blastp, tblastx, DNA, AA database), is it a prokaryote?
It might be that there are multiple copies of your test set of genes in your draft genome and one has a frame-shift. Use tblastx to detect those. (also glimmer doesn't "predict ORFs" ORFs do not need to be predicted, it predicts whether ORFs are protein-coding or not)

ADD REPLY
0
Entering edit mode

I use the gene sequence to search the genome sequence(blastn), and the genome is a prokaryotic genome. there are no copies of the test genes.The position of orf is loacated in the region of alignment in the genome. If I use the protein sequence to search the genome and get a local alignment result, can I conclude that this genome has the protein-coding gene? Thank you!

ADD REPLY
0
Entering edit mode

So, when you blastn DNA sequence of gene A (draft) against B, you get an insertion or deletion but when you blast B against A with the same parameters you do not get any with the same coordinates? Is that what you are trying to tell? That would worry me slightly, however it is most likely not true. could you post an example?

ADD REPLY
0
Entering edit mode

there is a complete genome A with a gene a(100bp); there is a draft genome B with a predicted orf b(90bp); when i align gene a with orf b, there is a 10bp deletion in the N terminus of orf b; ('*' mens match, and '-' means deletion)

a: ***********************************************************
b: -----*******************************************************

when I use gene a to search the genome B, there exists an alignment with total length.

a: ***********************************************************
B: ***********************************************************
ADD REPLY
0
Entering edit mode

I think you have been tricked by the alignment heuristic, that somehow doesn't score the more complete alignment better than the incomplete one. You can use SSearch or EMBOS water if in doubt, but I wouldn't be worried.

ADD REPLY
3
Entering edit mode
11.2 years ago
Bill Pearson ★ 1.0k

This is a well understood problem with using ORF finders like glimmer (and pretty much anything else) on data that is likely to have insertion/deletion (frameshift) errors. When FASTX was published in 1997 (Pearson et al, (1997) Genomics 46:24-36), we showed that there were many genes in a recently sequenced bacterial genome that could be extended by alignment with frameshifts.

I would argue that there is no reason to look for open reading frames. FASTX (and BLASTX, but you need to turn on the option that allows frame-shifts) will find all the genes you can find with ORF-finders, and more (because they are not limited to a minimum ORF length).

ADD COMMENT

Login before adding your answer.

Traffic: 2646 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6