About Percent_Identity And Bio::Searchio
3
1
Entering edit mode
10.7 years ago
chodar ▴ 10

Hi,

I have a question regards of how Bio::SearchIO calculate the percent identity in an alignment. I have this tabular output (called blasttest2 and its just a minimal fraction of a very large output) from a local blast generated with outfmt 6 option:

A B 44.00 150 76 4 1852 1994 673 821 1e-25 116
A B 44.34 557 235 21 1082 1600 46 565 7e-126 427
A C 56.77 155 58 3 1451 1600 1719 1869 5e-48 191
A C 65.85 82 25 2 1913 1991 2126 2207 8e-26 118

When I use the following code to print the output using Bio::SearchIO, the percent identity obtained are different from the original output. Im missing something on the code?

#!use/bin/perl
use strict;
use warnings;
use Bio::SearchIO;
use Bio::SeqIO;

my $blastin = Bio::SearchIO->new(
    -file   => "blasttest2",
    -format => 'blasttable'
);
while ( my $result = $blastin->next_result ) {
    while ( my $hit = $result->next_hit ) {
            while ( my $hsp = $hit->next_hsp ) {
            print "Query=", $result->query_name,"\tHit=", 
                     $hit->name,"\tpident=",$hsp->percent_identity,"\n";
                  }
             }
}

The output for this is:

Query=MDOMV2T00051 Hit=Phy000TMB5_AEDAE pident=46.6666666666667
Query=MDOMV2T00051 Hit=Phy000TMB5_AEDAE pident=54.0394973070018
Query=MDOMV2T00051 Hit=Phy003Y4FA_AEDAE pident=60.6451612903226
Query=MDOMV2T00051 Hit=Phy003Y4FA_AEDAE pident=67.0731707317073

Thanks for the help.

blast+ bioperl • 3.6k views
ADD COMMENT
1
Entering edit mode
10.7 years ago
SES 8.6k

This is actually a bug relating to how the number of identical sites is calculated, though it has been fixed in recent versions (1.6.901 is about 3 years old). I tested with this specific version and I can see this issue. Rather than try to fix this problem (that has apparently been fixed), the simple solution is to not use BioPerl because you can easily parse this yourself.

The output will be what you expect because this code is simply parsing the file, not deriving statistics by other methods:

$ perl biostars95834.pl
Query=A    Hit=B    PID=44.00
Query=A    Hit=B    PID=44.34
Query=A    Hit=C    PID=56.77
Query=A    Hit=C    PID=65.85
ADD COMMENT
0
Entering edit mode

Ok, got the point.

Thanks very much!

ADD REPLY
0
Entering edit mode
10.7 years ago
SES 8.6k

It appears you are looking at the results of another file based on the read names and the output. Here is the same code with your example blast file so there is no confusion:

#!/usr/bin/env perl

use 5.010;
use strict;
use warnings;
use Bio::SearchIO;

my $blast = Bio::SearchIO->new(-fh => \*DATA, -format => 'blasttable');

while ( my $result = $blast->next_result ) {
    while ( my $hit = $result->next_hit ) {
        while ( my $hsp = $hit->next_hsp ) {
            say join "\t", "Query=".$result->query_name,
            "Hit=".$hit->name,"pid=".$hsp->percent_identity;
        }
    }
}

__DATA__
A B 44.00 150 76 4 1852 1994 673 821 1e-25 116 
A B 44.34 557 235 21 1082 1600 46 565 7e-126 427
A C 56.77 155 58 3 1451 1600 1719 1869 5e-48 191
A C 65.85 82 25 2 1913 1991 2126 2207 8e-26 118

This produces what you would expect:

$ perl biostars95834.pl
Query=A    Hit=B    pid=44
Query=A    Hit=B    pid=44.3447037701975
Query=A    Hit=C    pid=56.7741935483871
Query=A    Hit=C    pid=65.8536585365854
ADD COMMENT
0
Entering edit mode
10.7 years ago
chodar ▴ 10

Hi SES,

Im put my perl script and the blast output in the same new directory to avoid confusions. The "blasttest2" was rename as "blast.res" to avoid some missint "t". So my three files are: test.pl (my code), test2.pl (your code) and blast.res (test file). So the code in test.pl now is now:

-file   => "blast.res",

When I run, the output is:

Query=A    Hit=B    pident=46.6666666666667
Query=A    Hit=B    pident=54.0394973070018
Query=A    Hit=C    pident=60.6451612903226
Query=A    Hit=C    pident=67.0731707317073

When I try your code, the output is:

Query=A    Hit=B    pid=46.6666666666667
Query=A    Hit=B    pid=54.0394973070018
Query=A    Hit=C    pid=60.6451612903226
Query=A    Hit=C    pid=67.0731707317073

Im lost.

ADD COMMENT
0
Entering edit mode

Show us the contents of your blast file and the script you are executing (with cat blast.res, etc.). The answer is that your script is probably reading something else. Also, show the output of perl -MBio::SearchIO -e 'print Bio::SearchIO->VERSION'. When you respond, do so as a comment to this, not as a new answer. Thanks.

ADD REPLY
0
Entering edit mode

Hi SES, here are the contests:

$cat blast.res
A B 44.00 150 76 4 1852 1994 673 821 1e-25 116
A B 44.34 557 235 21 1082 1600 46 565 7e-126 427
A C 56.77 155 58 3 1451 1600 1719 1869 5e-48 191
A C 65.85 82 25 2 1913 1991 2126 2207 8e-26 118

$cat test.pl
#!use/bin/perl
use strict;
use warnings;
use Bio::SearchIO;
use Bio::SeqIO;

my $blastin = Bio::SearchIO->new(
    -file   => 'blast.res',
    -format => 'blasttable'
);
while ( my $result = $blastin->next_result ) {
    while ( my $hit = $result->next_hit ) {
            while ( my $hsp = $hit->next_hsp ) {

            print "Query=", $result->query_name,"\tHit=", $hit->name,
            "\tpident=",$hsp->percent_identity,"\n";

        }

    }
}

$perl -MBio::SearchIO -e 'print Bio::SearchIO->VERSION' 
1.006901
ADD REPLY
0
Entering edit mode

I should have asked you how you are executing the script because that must be where you are going wrong. I just ran this and got exactly what you would expect:

Query=A    Hit=B    pident=44
Query=A    Hit=B    pident=44.3447037701975
Query=A    Hit=B    pident=56.7741935483871
Query=A    Hit=B    pident=65.8536585365854

A couple of minor points, your shebang line is wrong (there is no "/use/bin/"), and you should avoid naming things "test.pl" because I think there is a test script with that name on some systems.

ADD REPLY
0
Entering edit mode

Hi SES,

Thanks for the minor corrections. I already change the shebang line and rename test.pl to chq2.pl and test2.pl to ses.pl. Here are the instruction to execute the script:

chodar@housefly:~/Documentos/cleandir$ perl chq2.pl
Query=A Hit=B   pident=46.6666666666667
Query=A Hit=B   pident=54.0394973070018
Query=A Hit=C   pident=60.6451612903226
Query=A Hit=C   pident=67.0731707317073

chodar@housefly:~/Documentos/cleandir$ perl ses.pl
Query=A Hit=B   pid=46.6666666666667
Query=A Hit=B   pid=54.0394973070018
Query=A Hit=C   pid=60.6451612903226
Query=A Hit=C   pid=67.0731707317073
ADD REPLY

Login before adding your answer.

Traffic: 1491 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6