Dear all,
I'm attempting to parse the standard output from HMMER 'hmmsearch' using BioPerl's Bio::SearchIO module. I want to access some basic information for each HSP, such as %-identity and %-query coverage, but am running into some difficulties. Here is a MWE:
#!/usr/bin/perl -w
use strict;
use Bio::SearchIO;
my $in = Bio::SearchIO->new( -file => $ARGV[0], -format => 'hmmer' );
while ( my $result = $in -> next_result ) {
## this is a Bio::Search::Result::HMMERResult object:
print "Query name: ".$result->query_name."\n";
## this should be a Bio::Search::Hsp::HMMERHSP object (but it doesn't behave like one):
while( my $hit = $result->next_hit ) {
while( my $hsp = $hit->next_hsp ) {
my $percent_id = sprintf("%.2f", $hsp->percent_identity);
my $percent_q_coverage = sprintf("%.2f", ((($hsp->length('query')/($result->query_length)))*100));
print "\tHit name: ".$hit->name()."\n";
print "\tIdentity: ".$percent_id."\%\n";
print "\tFraction identical: ".$hsp->frac_identical('total')."\n";
print "\tNumber conserved residues: ".$hsp->num_conserved."\n";
print "\tQuery length: ".$result->query_length."\n";
print "\tAlignment length: ".$hsp->length('query')."\n";
print "\tQuery coverage: $percent_q_coverage\%\n";
## schema:
if ( ($percent_id >= 80) and ($percent_q_coverage >= 80) ) {
print "\t** Result: PRESENT\n\n";
} elsif ( ($percent_id >= 80) and ($percent_q_coverage < 80) ) {
print "\t** Result: TRUNCATED\n\n";
} else {
print "\t** Result: ABSENT\n\n";
}
}
}
}
However, this does not return the expected values for % identity, or for other stats like the proportion of identical sites or number of conserved sites (it returns 0 values in all cases), although it does return the correct query length information. A typical output looks something like:
Query name: querySeq_X
Hit name: hitSeq_Y
Identity: 0%
Fraction identical: 0
Number conserved residues: 0
Query length: 906
Alignment length: 906
Query coverage: 100.00%
** Result: ABSENT
Even when I know that querySeq_X
and hitSeq_Y
are identical (ie., identity should be 100%).
I suspect it's because my script is not accessing the HSP alignments themselves...? I've searched across CPAN etc., but while there are a number of HMMER-related modules available there is not much in the way of working examples for me to figure out what's going on. How can I parse the alignments themselves to access these kinds of data?
Any help would be much appreciated!
Cheers.
I don't see anything wrong with your code, which indicates that there may be an issue with the input. Please tell us the version of BioPerl and HMMER you used because that may be a factor.
Hi SES, I'm using HMMER 3.1b1 and Bioperl 1.6.901 I believe. I think perhaps Michael has a point below - I know that HMMs don't produce an alignment score per se (rather, a sequence score), but I'm surprised you can't get info such as the % identity etc as it clearly does output alignments for each domain hit (with a identical/conserved midline etc.).
A lot of the SearchIO::HowTo documentation talks about these domain alignments being the equivalent (from a parsing ability point of view) to a BLAST HSP, and so should be accessible using standard Bio::SearchIO syntax as my basic script. I also found some of the methods (eg.
my @h_ident = $hsp->seq_inds('query', 'identical');
) work but others do not. This makes me suspect it is a bug in the BioPerl code.....