What is the (non-mathematical) interpretation of this figure of how gene scan works?
Sorry I am more of a biologist than a statistician (or whatever the exact scientific field HMM belongs to!) Can it be interpreted as 1) :a sequence is defined as N, as the bases are added the sequence is being "checked" as remaining N or "going" to P (when, according to base content) the probability of being in N decreases from a threshold), and so on? 2) Or every possible combinations of placement (assignment of bases to states) of sequence on the markov chain is somehow checked and the most probable results are returned 3) or I am way off and better to leave the whole stuff?
I would explain it as follows: as you can see in the image you attached, it's sort of like a flow diagram (https://www.sparxsystems.com.au/enterprise_architect_user_guide/14.0/guidebooks/tools_ba_data_flow_diagram.html), the difference is that it was built automatically on a set of sequences and it captures the essential characteristics of that set of sequences.
Then a new sequence is run through it, and the HMM assigns a probability that the input sequence belongs to the same group of sequences the HMM was built on, based on the presence of those characteristics that define the original set of sequences.
Think of a HMM as the ‘most likely’ variations of a given sequence. Instead of worrying about whether the first base is an
A
for example, instead its represented as a set of probabilities of being anA, T, C or G
, and so on for the rest of the sequence.Thank you. How is it then, based on the probabilitis you mentioned, determined that whether a sequence belongs to 'gene' class or intergenic class?