I have an HMM database of 83 HMMs.
I want to use this to pull out all of the hits in NR so there are many millions of sequences here.
Would I use hmmsearch
or hmmscan
for this?
I have an HMM database of 83 HMMs.
I want to use this to pull out all of the hits in NR so there are many millions of sequences here.
Would I use hmmsearch
or hmmscan
for this?
See here for previous discussions on this issue. It isn't difficult to figure this out on your own: make a small database in lieu of nr
, say 1000 proteins, and do both kinds of searches. It should be pretty obvious what works better for your purposes.
If you use a tabular output and the same database size (the -Z switch
), you will get identical results with either approach, but hmmsearch
will be at least 2-3x faster, possibly even 10x.
If you want the alignments, with hmmsearch
you will get results that are easier to inspect. Basically you will have 83 chunks of results in the output, where for each HMM the hits will be listed and aligned. With hmmscan
you will get 300+ million chunks, because each individual sequence from nr
will be searched and aligned against all your HMMs. I would not want to do that. Given that hmmsearch
is also faster, to me that's a clear winner.
I would use hmmscan
only for a relatively small number of sequences, for example to annotate a proteome of a single species against the Pfam database.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thank you, this is what I was looking for.