Hello everyone,
I am tryng to build a hmmr profile for searching protein sequences in a tailored database. I have a set of multiple sequence alignments in fasta format that I have already transformed to stockholm format by using Biopython.The hmmer user manual states that it is possible to merge multiple stockholm files into one in order to build the hmmer profile based on this mult-stockolm file.
Here is where my problem begins. I am using one of the minitools in the hmmer suite, the esl-alimerge
command.
I am using the command:
esl-alimerge --amino -o multistockholm.sto --list file_list.txt
I get the following error:
Error, all alignments must have #=GC RF annotation; alignment 0 of file 1 does not
Of course this is error is self explanatory, so I am trying to build the #=GC RF field of the stockholm files.
I built a script to build a consensus sequence of each of the files, and I have already added the pertinent consensus sequence to each of the stockholm files. I get now this error:
Error, all alignments must have identical non-gap #=GC RF lengths; expected (RF length of first ali read): 322, alignment 1 of file 2 length is 609 (PROT.mafft.sto))
I understand that I have to put exactly the same GC RF field to each of the files. I already tried to concatenate all the consensus sequences and added them to each stockholm GC RF file.
When I do this, I get the following error:
Alignment input parse error: \ unexpected # of aligned annotation in #=GC RF line\ while reading Stockholm file PROT.mafft.sto\ at or near line 51
At this pint I don't know how else to proceed.
How does a GC RF field have to be formatted in order to merge stockholm files for use in hmmer suite?
Thank you all for your help in advance.