Question

Combining HMMs and fasta for HMMER searches

2

Entering edit mode

8.3 years ago

Anand Rao ▴ 640

I have a software that translates DNA and searches the translation product for matches to profile HMMs. The idea is to search these DNA sequences for matches to HMMs in PfamA (version 30, which is the latest one)

However, for my purpose, there is a better curated set of proteins found as fasta sequences.

I was recently told that fasta sequences can ALSO be used as HMMER queries, they'll just be converted to HMM on the fly. That is fine.

What I seek help with is figuring out, if I have one set of queries as HMMs from PfamA and another set of queries as plain fasta sequences, is it possible to combine them into one *.hmm file in some way?

HMMER fasta hmm • 4.5k views

ADD COMMENT • link updated 8.3 years ago by Jean-Karim Heriche 27k • written 8.3 years ago by Anand Rao ▴ 640

score 0 · Answer 1 · 2016-08-06

0

Entering edit mode

8.3 years ago

Jean-Karim Heriche 27k

Given an HMM and some sequences, you can use hmmalign to get a multiple sequence alignment then build a new HMM from it with hmmbuild. In your case, you would combine the PfamA sequences with your other sequences and align all to the PfamA HMM then rebuild the HMM from the resulting multiple sequence alignment.

ADD COMMENT • link 8.3 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Thanks but I dont think that is possible, because I suppose I did not make it abundantly clear that the FASTA sequences and the PfamA HMMs bear no relation to each other. So I cannot align the FASTA sequences to the HMMs using hmmalign, and re-build new/combined HMMs with hmmbuild. If you or someone else have another suggestion, I am open to ideas.

ADD REPLY • link 8.3 years ago by Anand Rao ▴ 640

0

Entering edit mode

As far as I know a *.hmm file contains one profile HMM generated by hmmbuild so combining sequences into one *.hmm file means building an HMM from all the sequences. If the sequences are not related to Pfam families then it makes sense to create separate HMMs and search your DNA sequences with these. However, you could also identify the Pfam domains represented in your proteins and add theses to the corresponding Pfam HMMs. I am also curious as to what "better curated" means. Pfam-A seems already a well curated resource.

ADD REPLY • link 8.3 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

I am also curious as to what "better curated" means. Pfam-A seems already a well curated resource.

When looking at individual well studied organism annotations (e.g. E.coli), Pfam/Rfam provide a very general guess, while biologists need a specific gene name etc.

ADD REPLY • link 5.6 years ago by predeus ★ 2.1k