Set a %coverage threshold during a hmmsearch using HMMer
1
0
Entering edit mode
10.1 years ago
CrLs ▴ 10

Hi everyone,

I would like to know if there is a way to set a %coverage treshold using hmmsearch. I've read the user guide, but I haven't found it.

Regards,

C.

sequence • 5.0k views
ADD COMMENT
1
Entering edit mode
10.1 years ago

You'll need to provide a definition of "%coverage" to be sure, but the answer is almost certainly "no".

When I hear "%coverage", I think of a measure of shared length, probably "what percent of the sequence is aligned to the model" (or vice versa). If this is what you're after, you may want to use hmmsearch's --tblout flag. This will give you an easy-to-parse space-delimited file that includes start/end-positions for both the query model and the target sequence, along with the total target sequence length. Knowing the model length also, the %coverage math from there is pretty easy to pull off.

ADD COMMENT
0
Entering edit mode

Hello Travis,

That's a great answer. Have you any advice on which parser i should use ? If there is one for python or perl ?

Thaks again !

ADD REPLY
1
Entering edit mode

I recommend writing your own "parser". First run hmmsearch

% hmmsearch --tblout my_output.tblout  query.hmm  target.fa

Then look at the format of the file produced when using hmmsearch's --tblout flag (in the example above: my_output.tblout). You'll find that this is likely the easiest parsing task you'll ever do. Simply walk through the lines of the table file one-at-a-time, (a) ignoring lines starting with #, and (b) accessing the content of each line using either split or a regex (I'm imagining Perl here). Heck, you could even use awk to parse the file, if you're a fan of awk

ADD REPLY
0
Entering edit mode

Ok. Thanks for you time and your answer. i'll try. (i'd give you a 'biostar gold' if it was available)

ADD REPLY
0
Entering edit mode

Notably you actually need to use --domtblout domains.txt because --tblout hits.txt doesn't give you per-alignment fields such as the position.

If you do --domtblout, you get a table with this header (and therefore these fields):

#                                                                            --- full sequence --- -------------- this domain -------------   hmm coord   ali coord   env coord
# target name        accession   tlen query name           accession   qlen   E-value  score  bias   #  of  c-Evalue  i-Evalue  score  bias  from    to  from    to  from    to  acc description of target
#------------------- ---------- ----- -------------------- ---------- ----- --------- ------ ----- --- --- --------- --------- ------ ----- ----- ----- ----- ----- ----- ----- ---- ---------------------
ADD REPLY

Login before adding your answer.

Traffic: 2212 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6