Sometimes, to me as a biologist, the asumptions that are at the basis of established bioinformatic methods seem inadequately oversimplifying the biological mechanisms that they try to model. I'm currently studying Markov chains, which seem to be used in several fields, also for sequence analysis, each nucleotide or amino acid being a "state". Compared to FSM, bayesian nets and petri nets, they have a very restrictive assumption: the markov property says that a state depends only on the previous state. I fail to see how this is acceptable in biology at all! The most evident incompatibility: How can any markov chain still be applicable to nucleotides, considering the nature of triplets, i.e. degeneracy of the genetic code? The transition probabilities from first to second base must be completely different than the transition from the second to the third!? Furthermore, the third base also depends on the first, not only the second. The only use I can see for markov chains, is for the "random sequence model", against which probability of a given sequence is compared, but even for the above would render it a bad random sequence model. I haven't understood yet, what else markov chains could be used for...
Further points are motifs, repeated sequences, hydrophobic vs hydrophile regions in a protein etc. The least important thing the occurence of a given nucleotide/aa in a sequence depends on, is its preceding nucleotide/aa...
thanks for some explanation
in the little studying that I've done .. it seems the most applicable HMM comparison is between the assembled genome .. and the reference genome? ... but I'd really like someone to point out whats wrong about this...