Bioperl Has Different Behaviours In Parsing Blast And Blast+ Result
2
1
Entering edit mode
13.4 years ago
Yixf ▴ 20

When I was using BioPerl to parse the blast results, I found that it can parse the blast(2.2.21) result but not the blast+(2.2.25+) result(all were in the default format).

The parser.pl code:

#!/usr/bin/perl
use strict;
use Bio::SearchIO;

my $fi     = $ARGV[0];
my $format = $ARGV[1];
my $in     = new Bio::SearchIO(
    -format => "$format",
    -file   => "$fi"
);

print "#Query\tHit\tScore\tBits\tEvalue\n";
while ( my $result = $in->next_result ) {
    while ( my $hit = $result->next_hit ) {
        while ( my $hsp = $hit->next_hsp ) {
            print $result->query_name, "\t", $hit->name, "\t",
              $hsp->score, "\t", $hsp->bits, "\t", $hsp->evalue, "\n";
        }
    }
}

The result of parsing blast (it is OK):

> #Query    Hit    Score    Bits    Evalue
first    gi|195972856|ref|NM_001130955.1|    21    42.1    2e-04
first    gi|195972854|ref|NM_015318.3|    21    42.1    2e-04
second    gi|261878474|ref|NM_001166295.1|    21    42.1    2e-04
second    gi|261878472|ref|NM_001166294.1|    21    42.1    2e-04
second    gi|261878551|ref|NM_001166417.1|    21    42.1    2e-04
second    gi|261878470|ref|NM_001166293.1|    21    42.1    2e-04
second    gi|261878468|ref|NM_014021.3|    21    42.1    2e-04
third    gi|115392135|ref|NM_007249.4|    21    42.1    2e-04

The result of parsing blast+ (No output except the header):

> #Query    Hit    Score    Bits    Evalue

================================================================

According to BioPerl's wiki (NCBI-BLAST parsing problems), XML format is recommended. But I found that Bioperl has problems in parsing blast or blast+ results in XML format.

The parser.pl is the same; The result of parsing blast in XML format (I have three queries, but it print the first one only!):

> #Query    Hit    Score    Bits    Evalue
first    gi|195972856|ref|NM_001130955.1|    21    42.1223    0.000184141
first    gi|195972854|ref|NM_015318.3|    21    42.1223    0.000184141
first    gi|261878474|ref|NM_001166295.1|    21    42.1223    0.000184141
first    gi|261878472|ref|NM_001166294.1|    21    42.1223    0.000184141
first    gi|261878551|ref|NM_001166417.1|    21    42.1223    0.000184141
first    gi|261878470|ref|NM_001166293.1|    21    42.1223    0.000184141
first    gi|261878468|ref|NM_014021.3|    21    42.1223    0.000184141
first    gi|115392135|ref|NM_007249.4|    21    42.1223    0.000184141

The result of parsing blast+ in XML format (Besides the same problem, it can not get the query id properly!):

> #Query    Hit    Score    Bits    Evalue
Query_1    gi|195972856|ref|NM_001130955.1|    42    39.1570490084919       0.000615407041092949
Query_1    gi|195972854|ref|NM_015318.3|    42    39.1570490084919     0.000615407041092949
Query_1    gi|261878474|ref|NM_001166295.1|    42    39.1570490084919    0.000615407041092949
Query_1    gi|261878472|ref|NM_001166294.1|    42    39.1570490084919    0.000615407041092949
Query_1    gi|261878551|ref|NM_001166417.1|    42    39.1570490084919    0.000615407041092949
Query_1    gi|261878470|ref|NM_001166293.1|    42    39.1570490084919    0.000615407041092949
Query_1    gi|261878468|ref|NM_014021.3|    42    39.1570490084919    0.000615407041092949
Query_1    gi|115392135|ref|NM_007249.4|    42    39.1570490084919    0.000615407041092949

Does anybody meet the same problems? What is the problem? How to solve it?

blast blast bioperl parsing error • 7.0k views
ADD COMMENT
0
Entering edit mode

I noticed this and I agree it is annoying. I didn't find a way to consensify the results.

ADD REPLY
0
Entering edit mode

Did you try using Blast 2.2.25 (not plus) and format your database with -parse_seqids? Not entirely sure as this is a long time ago, but I think the database format has slightly changed between then and now.

ADD REPLY
0
Entering edit mode

I have tried Blast 2.2.25 and format the database with -parse_seqids, it has the same problem. I think it is a bug.

ADD REPLY
3
Entering edit mode
13.4 years ago
Chris Fields ★ 2.2k

The best thing to do for this is to file it as a bug and attach test data. My guess is something changed in the latest BLAST+ text output.

Re: XML, it is supposed to be the most stable output; again, if something isn't working then file this as a bug (again, with example data) so it can be addressed. IIRC, re: query ID the tag BLAST used for reporting this had changed hence the odd return data.

ADD COMMENT
2
Entering edit mode

Here, https://redmine.open-bio.org/projects/bioperl as noted on the main website, http://bioperl.org (see Bugs).

ADD REPLY
0
Entering edit mode

Hi, Chris Fields Thanks for your reply. Do you known the email or website that I can report this bug?

ADD REPLY
0
Entering edit mode

Yeah it looks like it is a Bioperl problem for the output of 2.2.25, not specifically Blast+. Odd that the XML is acting up though.

ADD REPLY
0
Entering edit mode

At least one of the tags changed for query data, I believe

ADD REPLY
2
Entering edit mode
13.4 years ago
Burnedthumb ▴ 90

Instead of using bioperl, you can also specify which field you want using the "-outfmt" tag. Like this:

blastn -db your_database -query your_query.fasta -outfmt "6 qseqid sseqid score bitscore evalue" >> blastout.txt

ADD COMMENT
1
Entering edit mode

The bug has been reported: https://redmine.open-bio.org/issues/3265

ADD REPLY
0
Entering edit mode

I known this. But I want to know why BioPerl can not parse blast result properly.

ADD REPLY

Login before adding your answer.

Traffic: 1352 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6