Running A Standalone Batch Blast With Fasta File Using Perl, Can'T Get Results For All Sequences. How To Get Top Hits?
2
1
Entering edit mode
13.5 years ago

I'm running standalone blast locally. My blast query sequencess are in a fasta file called human_seqs.fasta, which comprise around 300 sequences. Being blasted against a much larger database called DB.fasta. I then take the Accession IDs of the blast results and save to another file called IDs.txt to later retrieve the full fasta sequences of the hits in the database. The way my code is at the moment, it does the BLAST ok, but it is only running the BLAST for the first sequence in the query file. So I get a list of IDs but only for the first human seq query, still another 300 or so un-blasted!. I can guess that there is something wrong with my loop/code structure, but I can't figure it out!

Also, I would like to just extract the Accession IDs of the top hits, so I guess the hit with the lowest 'e' value or highest score. I'm not sure how to do this. Thanks

#!/usr/bin/perl  -w


use strict;

use Bio::Tools::Run::StandAloneBlast;
use Bio::SearchIO;
use Bio::SeqIO;

my $in_file = human_seqs.fasta";
my $virus_file = DB.fasta";
my @params = (-p => 'blastp', -d => 'DB_proteins.fasta', -o => 'report.bls', -e     => '10' );

my $seqio_object = Bio::SeqIO->new(-file => $in_file);
my $virus_seqio_object = Bio::SeqIO->new(-file => $virus_file);  
my $factory = Bio::Tools::Run::StandAloneBlast->new(@params);

my $blast_report = "";

open(my $out, '>', 'IDs.txt') or die "failed to open output for write: $!";  # open   output file for saving accession Ids

while (my $seq = $seqio_object->next_seq){
    print "Blasting: ", $seq->id, "...\n";
    $blast_report = $factory->blastall($seq);
    while (my $result = $blast_report->next_result){
        while (my $hit = $result->next_hit){
            $hit->description =~ m/\[(.*)\]/;
            my $virus = $1;    
            print $hit->name(), "\t",  $hit->significance(), "\t", $hit-> accession(),"\t", $virus, "\n";

         my $accession_ids= $hit-> accession();

print { $out }   $accession_ids,"\n";   # save accession IDs to file IDs.txt for passing to fastacmd

                }
}
close($out);

system("fastacmd -d DB.fasta -i IDs.txt -p T -o full_seqs_hits.fasta");    # use fastacmd via the cmd line to pullout full seqs

exit;
}
blast perl bioperl • 7.8k views
ADD COMMENT
0
Entering edit mode

and what happens when you manually remove the first sequence from the query file? the second one returns a result? looks like you just need more debugging statements and you'll get to the bottom of this

ADD REPLY
0
Entering edit mode

Hi James! Please don't forget to accept the answer that solved your problem.

ADD REPLY
3
Entering edit mode
13.5 years ago
brentp 24k

You have an exit; inside your loop. It seems you want to move the final "}" up to the line before "close($out)".

EDIT: And it looks like you can get the best blast hit by adding "-best_hit_only => 1" to your params. From the commandline, it would be -K 1.

ADD COMMENT
0
Entering edit mode

Thank you, that worked :-) Uisng -K 1 still gives me more than one hit though. Eg for the first query sequence I seem to get the top 3 hits!

ADD REPLY
0
Entering edit mode

I think I've found it -v1 -b1 looks like it does the job

ADD REPLY
0
Entering edit mode
13.5 years ago

I'd suggest looking at the sequence file. I have noticed that for some sequences the fasta definition line runs into the sequence data and this gives an "empty" sequence that returns no hits. Unlikely to be your program here, but worth a quick check.

ADD COMMENT

Login before adding your answer.

Traffic: 2137 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6