Question

Non-Coding Markov Model In Prokaryotic Gene Prediction

1

Entering edit mode

12.6 years ago

aniket.schneider ▴ 10

I'm building a very basic Markov Model-based prokaryotic gene finder for a class project, and I have been reading some literature about GLIMMER for guidance. If I have understood the basic algorithm correctly, GLIMMER scores a given ORF in all six reading frames, normalizes the six scores so that they represent a probability that the ORF is a gene, and then predicts a gene if the ORF scores above a certain threshold in the correct reading frame (with some filtering for overlaps after this). I have two questions that I hope someone more familiar with these types of algorithms can give me some guidance with.

First, they mention earlier in the paper that intuitively one would want to have a seventh model for non-coding regions, but that this is "not strictly necessary". I'm not sure I understand why this isn't necessary. I imagine a situation where an ORF scores very poorly in all six reading frames, but the normalization makes the correct reading frame stand out, so it appears to be a gene. Wouldn't you need a non-coding model as a reference point?

Second, and probably related, how does one actually do this normalization? Is it as simple as just scaling the six scores so that they add up to 1.0? Or is there a more general way of normalizing the score from a Markov Model that accounts for the length of the sequence?

Please point out any egregious misunderstandings, as I am only just beginning to study these methods.

• 2.3k views

ADD COMMENT • link updated 6.4 years ago by Biostar 20 • written 12.6 years ago by aniket.schneider ▴ 10

score 0 · Answer 1 · 2012-05-03

0

Entering edit mode

12.6 years ago

Niek De Klein ★ 2.6k

If all 6 of the reading frames score poorly normalization won't make one of the scores go over the threshold. The model for non-coding is not strictly necessary because if all 6 regions score low, by virtue of it not being a coding region, it is be a non-coding region. There are different methods of normalization, did they not put their method in the paper?

ADD COMMENT • link 12.6 years ago by Niek De Klein ★ 2.6k

0

Entering edit mode

The paper didn't describe the normalization method. I'm trying to sort through the source code for it, but I haven't had much luck yet.

ADD REPLY • link 12.6 years ago by aniket.schneider ▴ 10

0

Entering edit mode

what is the name of the paper?

ADD REPLY • link 12.6 years ago by Niek De Klein ★ 2.6k

0

Entering edit mode

Microbial gene identification using interpolated Markov models

ADD REPLY • link 12.6 years ago by aniket.schneider ▴ 10