Hi all, I am trying to generate plausible aligned sequences from a MSA.
My current idea is to use hmmer's hmmbuild to build a hmm model, then hmmemit -> hmmalign to generate an aligned sequence.
However, it seems like hmmbuild is realigning the sequences. Therefore the final aligned sequence does not match my original MSA.
Does anyone have any suggestions? Thanks, David
Example
The starting MSA (using PF00313 from Pfam):
A0A1G7SQH8.1/3-67 -QGF.V..K...W..F.......NA...E......K....G...F..G............F......I........G..........P..........D...........D.........G..........G............E.......D..........V..F..VH..F....S..A.I......E...D..RG.................gF.R..S...L......D.....E.....G.A...R....V..E.....Y..E.ASP.........GQR....G...L...Q.A.D.RVTP-
Build HMM model:
hmmbuild -o test.log -O test_alignment.txt test.hmm PF00313.uniprot
checking the alignment produced by hmmbuild (test_alignment.txt) shows that the alignment has shifted
A0A1G7SQH8.1/3-67
~QGF.V..K...W..F.......NA...E......K....G...F..G............F......I........G..........P..........D...........D.........G..........G............E.......D..........V..F..VH..F....S..A.I......E...D..RG.................gF.R..S...L......D.....E.....G.A...R....V..E.....Y..E.ASP.........GQR....G...L...Q.A.D.RVTP
Emit a sequence gives a gapless sequence:
hmmemit test.hmm
CSD-sample1
IDGTMCTAAATSIFKKTFGFIHQHNLPEDSYKSCTYLVHSSTVEKFLQVVKPAELLCFDVEKVGPYPVGGANALQIRS
If you want
hmmemit
to sample an alignment, have you triedhmmemit -a
?This is in the help page and in the documentation:
Options controlling what to emit: -a : emit alignment
Could you provide the command line you are running and illustrate the issue with an example?
Added by editing the initial question.
What do you mean by plausible aligned sequences from the MSA?
If you have an MSA do you not already have aligned sequences to use?
Yes I have lots of sequences, but I would like new sequences that are not necessarily in the MSA but which fit within the HMM model. (In the same way a trained HMM model can identify unique sequences, not necessarily only those within the training MSA. But now emitting sequences rather than searching.)