I'm trying to learn about computational biology and this professor is describing an example of how a gene finding prediction model works. He then proceeds to explain why models should be designed to predict along the plus and minus strand simultaneously instead of designing a single strand prediction model and then running it on both strands. He says that false positives may arise because codon frequencies are non random in both strands when you are in a gene region because the codons on one strand cast a "shadow" on the other strand and can fool the predictor into giving you false positive hits for genes when it doesn't have access to the other strands information. I don't understand this at all because if the opposite strand is non-coding, why would the codons on the gene containing strand "cast a shadow" on the other strand that could fool a prediction algorithm? Shouldn't the base pair triplets on the other strand be nonsense that wouldn't contain any gene patterns and not fool a gene prediction HMM?
This is the link to the part of the lecture I'm talking about: https://youtu.be/uBZAWM612_E.
I don't understand this "codon shadow" or codons linked at the 3rd position idea. Can someone please help clarify this for me?
take a moment to validate or comment all the questions you asked before: