We have used a Markov model with several window sizes to generate random genomes. I am wondering if it would be accurate to describe this model as "hidden" or solely as a Markov model.
We have used a Markov model with several window sizes to generate random genomes. I am wondering if it would be accurate to describe this model as "hidden" or solely as a Markov model.
From Wikipedia
In a regular Markov model, the state is directly visible to the observer, and therefore the state transition probabilities are the only parameters. In a hidden Markov model, the state is not directly visible, but output, dependent on the state, is visible. Each state has a probability distribution over the possible output tokens. Therefore the sequence of tokens generated by an HMM gives some information about the sequence of states. Note that the adjective 'hidden' refers to the state sequence through which the model passes, not to the parameters of the model; even if the model parameters are known exactly, the model is still 'hidden'.
It will depend on how you are generating the sequences, we would need more information in order to identify the type used. You can complement/edit your question in order to do so.
Generating sequences according to a given alphabet with known probability per symbol and with specified correlation (1st degree, 2nd order, window size, etc.) is always a simple Markov Model. Most random genome generators use such models. In this case you know the state of the chain every time step and transition probabilties too. You just don't know the output, i. e., the random genome.
In the case of the HMM, you know the random genome and want to generate the model or determine in what state of it your random genome belongs.
That's why we use Markov models for bootstraping purposes and HMM for family/relationship inference/clustering.
While this isn't a 'cut-and-dry' answer the rule of thumb I've used is that:
In a normal Markov model you are using a single set of transition probabilities to generate a random sequence which follows a specific pattern. In a hidden Markov model you are using a separate "hidden state" which determines what the transition probabilities.
The classic example is scanning a set of sequences for gene coding and non-gene coding regions. Since they have different transition probabilities one can distinguish between them by looking at a window of sequences.
If your generative process has two (or more) sets of NT transition probabilities (ie. one for gene coding and one for intergenic regions) then its a Hidden Markov Model. If you simply define a single set of transition probabilities then its a normal Markov Model.
Hope that clears up the difference,
Will
That's not true. Two sets of transition probabilities just make a nested Markov chain or a coupled one. To be hidden you cannot have access to the state of the chain, that's why you need some "hint" variables to estimate where you are in the chain. The basic question in a HMM is "Given this information and these parameters, transitions probabilities, what's my probable state?". This is quite cut & dry.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Interesting to learn about this subject. I was unaware of the distinction. :)
Usually in Population Genetics we use a "regular" Markov Model, not hidden, everything is open. In other parts HMM are the mostly used.