Question

Set a %coverage threshold during a hmmsearch using HMMer

0

Entering edit mode

10.1 years ago

CrLs ▴ 10

Hi everyone,

I would like to know if there is a way to set a %coverage treshold using hmmsearch. I've read the user guide, but I haven't found it.

Regards,

C.

sequence • 5.0k views

ADD COMMENT • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by CrLs ▴ 10

Ram · Answer 1 · 2014-11-04

1

Entering edit mode

10.1 years ago

Travis Wheeler ▴ 110

You'll need to provide a definition of "%coverage" to be sure, but the answer is almost certainly "no".

When I hear "%coverage", I think of a measure of shared length, probably "what percent of the sequence is aligned to the model" (or vice versa). If this is what you're after, you may want to use hmmsearch's --tblout flag. This will give you an easy-to-parse space-delimited file that includes start/end-positions for both the query model and the target sequence, along with the total target sequence length. Knowing the model length also, the %coverage math from there is pretty easy to pull off.

ADD COMMENT • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by Travis Wheeler ▴ 110

0

Entering edit mode

Hello Travis,

That's a great answer. Have you any advice on which parser i should use ? If there is one for python or perl ?

Thaks again !

ADD REPLY • link 10.1 years ago by CrLs ▴ 10

1

Entering edit mode

I recommend writing your own "parser". First run hmmsearch

% hmmsearch --tblout my_output.tblout  query.hmm  target.fa

Then look at the format of the file produced when using hmmsearch's --tblout flag (in the example above: my_output.tblout). You'll find that this is likely the easiest parsing task you'll ever do. Simply walk through the lines of the table file one-at-a-time, (a) ignoring lines starting with #, and (b) accessing the content of each line using either split or a regex (I'm imagining Perl here). Heck, you could even use awk to parse the file, if you're a fan of awk

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by Travis Wheeler ▴ 110

0

Entering edit mode

Ok. Thanks for you time and your answer. i'll try. (i'd give you a 'biostar gold' if it was available)

ADD REPLY • link 10.1 years ago by CrLs ▴ 10

0

Entering edit mode

Notably you actually need to use --domtblout domains.txt because --tblout hits.txt doesn't give you per-alignment fields such as the position.

If you do --domtblout, you get a table with this header (and therefore these fields):

#                                                                            --- full sequence --- -------------- this domain -------------   hmm coord   ali coord   env coord
# target name        accession   tlen query name           accession   qlen   E-value  score  bias   #  of  c-Evalue  i-Evalue  score  bias  from    to  from    to  from    to  acc description of target
#------------------- ---------- ----- -------------------- ---------- ----- --------- ------ ----- --- --- --------- --------- ------ ----- ----- ----- ----- ----- ----- ----- ---- ---------------------

ADD REPLY • link 3.1 years ago by multimeric ▴ 30