I have a software that translates DNA and searches the translation product for matches to profile HMMs. The idea is to search these DNA sequences for matches to HMMs in PfamA (version 30, which is the latest one)
However, for my purpose, there is a better curated set of proteins found as fasta sequences.
I was recently told that fasta sequences can ALSO be used as HMMER queries, they'll just be converted to HMM on the fly. That is fine.
What I seek help with is figuring out, if I have one set of queries as HMMs from PfamA and another set of queries as plain fasta sequences, is it possible to combine them into one *.hmm file in some way?
Thanks but I dont think that is possible, because I suppose I did not make it abundantly clear that the FASTA sequences and the PfamA HMMs bear no relation to each other. So I cannot align the FASTA sequences to the HMMs using hmmalign, and re-build new/combined HMMs with hmmbuild. If you or someone else have another suggestion, I am open to ideas.
As far as I know a *.hmm file contains one profile HMM generated by hmmbuild so combining sequences into one *.hmm file means building an HMM from all the sequences. If the sequences are not related to Pfam families then it makes sense to create separate HMMs and search your DNA sequences with these. However, you could also identify the Pfam domains represented in your proteins and add theses to the corresponding Pfam HMMs. I am also curious as to what "better curated" means. Pfam-A seems already a well curated resource.
When looking at individual well studied organism annotations (e.g. E.coli), Pfam/Rfam provide a very general guess, while biologists need a specific gene name etc.