So I've read a few threads on here... and it seems that there are quite a few questions about what SPECIFICALLY makes up a hidden markov model:
... I'll do my very best to present this information clearly ...
(please correct any of this explicitly if its incorrect ... or just correct it, as I've made this a community wiki. )
primary assumption:
Application of HMMs is to compare a reference genome, to one currently being assembled.
NOT in the manner which the current nucleotide is the sole determinant of the next in the sequence. Which is incorrect even if we're talking from the amino acid standpoint ( IE 3 base pairs ) ... see http://biostar.stackexchange.com/questions/1221/when-can-a-markov-model-be-described-as-hidden. Right? ... ok.
secondary assumptions:
"memoryless" >>> "Markov property"
http://en.wikipedia.org/wiki/Markov_property"Stochastic" >>> non-deterministic
http://en.wikipedia.org/wiki/Stochastic_processThe property of being "hidden"
Questions:
1) Memoryless means it is not influenced by the previous sequences ( finding / states / residues / determinations / whatever the previous "items" were ) ... How can one say that the evolutionary precursors to a genomic sequence have "no impact"? ...
NOTE: It makes sense that the entire gamut of evolution which lead up to that genome ... would influence it
2) Specifically what makes gives this process its designation of "hidden" ( in "Hidden Markov Model" ) ? As mentioned in this post: http://biostar.stackexchange.com/questions/1221/when-can-a-markov-model-be-described-as-hidden
It is stated ( from wiki link ) that "the state is not directly visible, but output, dependent on the state, is visible."
NOTE: The sequence is there and we're reading it with some process... there is nothing "hidden" about this.
Does the "hidden" refer to not knowing the evolutionary process which placed that nucleotide there?
OR
Does the "hidden" designation refer to some process-dependent applications... say sequencing done by fluorescing molecules ( which would be indicative of a specific nucleotide ) ... but not a "direct read"
"direct read" as meaning "Yes unequivocally , this is a Cytosine molecule here" ... not by perception of causal relation ... but instead by "directly reading"
Please Specifically label and answer questions 1 and 2 =]
HMM can be used for so many things. What specific application are you talking about? HMMER?
@lh3 I was speaking for the application of DNA sequencing ... specifically de novo
What is the exact "application of DNA sequencing"? What do you mean by "de novo"? Are you thinking to find de novo SNPs from a family trio using sequencing data (like Conrad et. and Roach et al)? In the small area of sequence analysis alone, HMM can model so many things with hidden states interpreted very differently. You should give the exact biological problem you are thinking about.
@lh3 specifically HMM application to sequencing a individual humans genome, with multiple reads where the HMM model is trained by those multiple alignments to deduce the consensus genome
What is the exact "application of DNA sequencing"? What do you mean by "de novo"? Are you thinking to find de novo SNPs from a family trio using sequencing data (like Conrad et. and Roach et al)? Anyway, your question is too vague. You should really give the specific problem you want to solve. HMM can be used to model so many things and the interpretation of the hidden states varies greatly. And without the description of your problem, I am even not sure if your problem in mind can be solved by HMM in the first place. In all, please be specific.
What is the exact "application of DNA sequencing"? What do you mean by "de novo"? Are you thinking to find de novo SNPs from a family trio using sequencing data (like Conrad et. and Roach et al)? Anyway, your question is too vague. You should really give the specific problem you want to solve. HMM can be used to model so many things and the interpretation of the hidden states varies greatly.
What is the exact "application of DNA sequencing"? What do you mean by "de novo"? Are you thinking to find de novo SNPs from a family trio using sequencing data (like Conrad et. and Roach et al)? Anyway, your question is too vague. You should really give the specific biological problem you want to solve. HMM can be used to model so many things and the interpretation of the hidden states varies greatly.
What is the exact "application of DNA sequencing"? What do you mean by "de novo"? Are you thinking to find de novo SNPs from a family trio using sequencing data (like Conrad et. and Roach et al)? Anyway, your question is too vague. You should really give the specific biological problem you want to solve. HMM can be used to model so many things and the interpretation of the hidden states varies greatly. Before you thoroughly understand HMM in a very specific application, there is no way you can understand HMM in a more general context.
What is the exact "application of DNA sequencing"? What do you mean by "de novo"? Are you thinking to find de novo SNPs from a family trio using sequencing data (like Conrad et. and Roach et al)? Anyway, your question is too vague. You should really give the specific biological problem you want to solve. HMM can be used to model so many things and the interpretation of the hidden states varies greatly. There are things common to all HMMs, but before you thoroughly understand HMM in a very specific application, there is no way you can understand HMM in a more general context.
What is the exact "application of DNA sequencing"? What do you mean by "de novo"? Are you thinking to find de novo SNPs from a family trio using sequencing data (like Conrad et. and Roach et al)? Your trouble is to mix abstract concepts with detailed applications, which is really confusing to me. You should really give the exact biological problem you are thinking.
What is the exact "application of DNA sequencing"? What do you mean by "de novo"? Are you thinking to find de novo SNPs from a family trio using sequencing data (like Conrad et. and Roach et al)? In the small area of sequence analysis alone, HMM can model so many things with hidden states interpreted very differently. You should really give the exact biological problem you are thinking.
What is the exact "application of DNA sequencing"? What do you mean by "de novo"? Are you thinking to find de novo SNPs from a family trio using sequencing data (like Conrad et. and Roach et al)? In the small area of sequence analysis alone, HMM can model so many things with hidden states interpreted very differently. You should really give the exact biological problem you are thinking about. Your question is entirely confusing to me. Sorry.
@delinquentme I'd second the suggestion to be more specific about the application of HMM you are interested in understanding. Typically you are not using an HMM to predict nucleotides. As you point out we're observing the nucleotides from our sequencing experiments. The goal of a HMM to predictions about some biological property from the observed sequence. For example we observe a sequence, and we want to known where the genes are. The observed data are the nucleotide labels, and the hidden property is "in gene", "not in gene".
@deliquentme, are thinking of using HMM to align sequences? In that case the nucleotide labels in the sequences are the observed data, and the hidden states are whether that position represents an insertion, a deletion, or a substitution.
@charles ... but wouldn't that be deduced from getting multiple coverage .. and simply figuring out which is the most statistically probable sequence?
If you are thinking to infer a consensus without gaps, I do not see the point of using HMM. Most simplistic methods will work sufficiently well. HMM becomes really powerful when you start to deal with gaps, but my impression is you have not been prepared for such complexity. Read my BAQ paper [PMID:21320865]. Not the same, but very relevant.
If you are thinking to infer a consensus without gaps, I do not see the point of using HMM. Most simplistic methods will work sufficiently well. HMM becomes really powerful when you start to deal with gaps, but my impression is you have not been prepared for such complexity. Read my BAQ paper [PMID:21320865]. Not the same, but relevant if you think in the right way.
Do you thoroughly understand the few simple examples in Richard Durbin's "Biological sequence analysis"? If not, understand those examples first and then revisit your own questions.
Do you thoroughly understand the few simple examples in Richard Durbin's "Biological sequence analysis"? If not, understand those examples first and then revisit your own questions.
@delinquentme, as ih3 says you don't need an HMM if you are just piling up reads and throwing out ones with mis-matches. On the other had HMM are useful if want to start worrying about whether a mis-match between an assembled sequence and a reference genome is a SNP or a sequencing error. Is their some particular program or paper you are trying to understand? HMM are used to solve all sorts of problems from speech recognition, to sequence alignment, to gene finding, to protein structure determination, the details and the vocabulary vary from problem to problem.
@delinquentme, with respect to question one: if you are talking about sequence alignment, the HMM isn't concerned with the different states of the sequence over evolutionary history, except for the assumption that the sequences have a common ancestor. Rather we're looking at how states change as we move from left to right in the sequence. Suppose we are doing sequence alignment with gaps. We can't observe gaps, we can only infer them. Roughly speaking the probability that a position is a gap depends primarily on whether or not the position immediately to its left is a gap.
@Charles is there a succinct text on sequencing, gaps additions and deletions? Im a programmer whos pretty new to biology... I'd love to get to wrapping my head around this more