Question

Would you use HMMSCAN or HMMSEARCH when the query is much larger than the database?

0

Entering edit mode

3.4 years ago

O.rka ▴ 740

I have an HMM database of 83 HMMs.

I want to use this to pull out all of the hits in NR so there are many millions of sequences here.

Would I use hmmsearch or hmmscan for this?

nr hmmsearch hmmer hmm hmmscan • 1.6k views

ADD COMMENT • link 3.4 years ago by O.rka ▴ 740

score 2 · Answer 1 · 2021-08-04

See here for previous discussions on this issue. It isn't difficult to figure this out on your own: make a small database in lieu of nr, say 1000 proteins, and do both kinds of searches. It should be pretty obvious what works better for your purposes.

If you use a tabular output and the same database size (the -Z switch), you will get identical results with either approach, but hmmsearch will be at least 2-3x faster, possibly even 10x.

If you want the alignments, with hmmsearch you will get results that are easier to inspect. Basically you will have 83 chunks of results in the output, where for each HMM the hits will be listed and aligned. With hmmscan you will get 300+ million chunks, because each individual sequence from nr will be searched and aligned against all your HMMs. I would not want to do that. Given that hmmsearch is also faster, to me that's a clear winner.

I would use hmmscan only for a relatively small number of sequences, for example to annotate a proteome of a single species against the Pfam database.