Entering edit mode
6.6 years ago
bitpir
▴
250
Hi,
I am not quite understanding the output of the .predict file from Glimmer3.02 ORF predictor.
Here's a sample of the output file
ref|NC_023013.1| Haloarcula hispanica N601 chromosome 1, complete sequence
orf00001 1 1575 +1 18.88
orf00003 2355 1645 -1 12.87
According to the documentation, column1= ID, column2=start of gene, column3= stop of gene, column4=reading frame, column5=The per-base “raw” score of the gene.
My questions are:
- to calculate the ORF score (100*log-odd ratio) of the gene, do I multiply column5 by the length of the gene?
- Is there a good threshold (either for column 5 or the calculated score) to see if the predicted ORF is likely to be true?
Thanks for the help!
I see more than one question :-)
Good catch! Thought of another question but forgot to change the grammar! :)
a bit pragmatic maybe, but all really depends on how you run glimmer.
Plain glimmer3 predictions often are an underprediction and don't get the start codon right. the included iterative workflow creates a first model, determines a PWM on the most likely Shine-Dalgarno site and a better estimate of the start codon distribution and reruns glimmer using this information. The resulting gene model is far more accurate than the initial one.
Thanks for the info. About the iterative workflow, I often run into a problem of generating PWM. It works for some files but not others. Wonder if this is common?