When I was using BioPerl to parse the blast results, I found that it can parse the blast(2.2.21) result but not the blast+(2.2.25+) result(all were in the default format).
The parser.pl code:
#!/usr/bin/perl
use strict;
use Bio::SearchIO;
my $fi = $ARGV[0];
my $format = $ARGV[1];
my $in = new Bio::SearchIO(
-format => "$format",
-file => "$fi"
);
print "#Query\tHit\tScore\tBits\tEvalue\n";
while ( my $result = $in->next_result ) {
while ( my $hit = $result->next_hit ) {
while ( my $hsp = $hit->next_hsp ) {
print $result->query_name, "\t", $hit->name, "\t",
$hsp->score, "\t", $hsp->bits, "\t", $hsp->evalue, "\n";
}
}
}
The result of parsing blast (it is OK):
> #Query Hit Score Bits Evalue
first gi|195972856|ref|NM_001130955.1| 21 42.1 2e-04
first gi|195972854|ref|NM_015318.3| 21 42.1 2e-04
second gi|261878474|ref|NM_001166295.1| 21 42.1 2e-04
second gi|261878472|ref|NM_001166294.1| 21 42.1 2e-04
second gi|261878551|ref|NM_001166417.1| 21 42.1 2e-04
second gi|261878470|ref|NM_001166293.1| 21 42.1 2e-04
second gi|261878468|ref|NM_014021.3| 21 42.1 2e-04
third gi|115392135|ref|NM_007249.4| 21 42.1 2e-04
The result of parsing blast+ (No output except the header):
> #Query Hit Score Bits Evalue
================================================================
According to BioPerl's wiki (NCBI-BLAST parsing problems), XML format is recommended. But I found that Bioperl has problems in parsing blast or blast+ results in XML format.
The parser.pl is the same; The result of parsing blast in XML format (I have three queries, but it print the first one only!):
> #Query Hit Score Bits Evalue
first gi|195972856|ref|NM_001130955.1| 21 42.1223 0.000184141
first gi|195972854|ref|NM_015318.3| 21 42.1223 0.000184141
first gi|261878474|ref|NM_001166295.1| 21 42.1223 0.000184141
first gi|261878472|ref|NM_001166294.1| 21 42.1223 0.000184141
first gi|261878551|ref|NM_001166417.1| 21 42.1223 0.000184141
first gi|261878470|ref|NM_001166293.1| 21 42.1223 0.000184141
first gi|261878468|ref|NM_014021.3| 21 42.1223 0.000184141
first gi|115392135|ref|NM_007249.4| 21 42.1223 0.000184141
The result of parsing blast+ in XML format (Besides the same problem, it can not get the query id properly!):
> #Query Hit Score Bits Evalue
Query_1 gi|195972856|ref|NM_001130955.1| 42 39.1570490084919 0.000615407041092949
Query_1 gi|195972854|ref|NM_015318.3| 42 39.1570490084919 0.000615407041092949
Query_1 gi|261878474|ref|NM_001166295.1| 42 39.1570490084919 0.000615407041092949
Query_1 gi|261878472|ref|NM_001166294.1| 42 39.1570490084919 0.000615407041092949
Query_1 gi|261878551|ref|NM_001166417.1| 42 39.1570490084919 0.000615407041092949
Query_1 gi|261878470|ref|NM_001166293.1| 42 39.1570490084919 0.000615407041092949
Query_1 gi|261878468|ref|NM_014021.3| 42 39.1570490084919 0.000615407041092949
Query_1 gi|115392135|ref|NM_007249.4| 42 39.1570490084919 0.000615407041092949
Does anybody meet the same problems? What is the problem? How to solve it?
I noticed this and I agree it is annoying. I didn't find a way to consensify the results.
Did you try using Blast 2.2.25 (not plus) and format your database with -parse_seqids? Not entirely sure as this is a long time ago, but I think the database format has slightly changed between then and now.
I have tried Blast 2.2.25 and format the database with -parse_seqids, it has the same problem. I think it is a bug.