I'm a bit confused by the alignment page of a Pfam entry. They provide the 'seed' alignment from which the initial HMM is created. They also provide the 'full' alignment that, to my understanding, is created by searching the UniProtKB and adding in those proteins significantly similar to the seed HMM. The text below is cut and pasted from their tutorial.
"Each Pfam entry is represented by a set of aligned sequences with their probabilistic representation - called a profile hidden Markov model (HMM). The profile HMM is trained on a small representative set of aligned sequences that are known to belong to the family (the 'seed' alignment). This model is then used to search exhaustively against a large sequence database (e.g. UniProtKB) to find all homologous sequences. Those sequences that are significantly similar to the model are aligned to the profile HMM in order to provide the full alignment."
In addition they provide alignments against representative proteomes, UnitProt and NCBI databases. Here is where I'm confused, the sequence count for UniProt is larger than that for full. But are these not the same? Perhaps what is happening here is that the UnitProt alignment is all sequences returned by a search of UniProt whereas full are a proper subset of those deemed appropriate for inclusion in the actual HMM.
Am I understanding this correctly?
An additional question. It seems like the word alignment often mean the HMM. Am I to understand that in this case the full alignment is the HMM created from all sequences in the full MSA?
Thanks for any guidance provided.