I am trying to understand the concept behind profile-hmms. I have a question on emission probabilities.
Every residue position in a profile-hmm has a match, insertion, deletion state. The match state has emission probabilities based on the sequence it was trained on.
What does the emission probability of the insertion and deletion states look like? How are they trained? Would the probability just be evenly split for all the symbols?
Thanks. I think I got it now. So the insertion state has the probability of the "consensus insertions". I guess the deletion state would have to be silent, since there isn't any symbols to represent it in the observed sequence.
Small clarification: Insertion states are defined by a "stretch" of consecutive non-consensus columns (possibly of zero length). The length of the stretch is used to estimate the self-transition probability (modelling insertion length). The emission probabilities for the insert state are calculated based on the residues in the stretch, but if I am not mistaken that is usually dominated by the pseudocounts in the case of HMMER profiles.