I'm running standalone blast locally. My blast query sequencess are in a fasta file called human_seqs.fasta, which comprise around 300 sequences. Being blasted against a much larger database called DB.fasta. I then take the Accession IDs of the blast results and save to another file called IDs.txt to later retrieve the full fasta sequences of the hits in the database. The way my code is at the moment, it does the BLAST ok, but it is only running the BLAST for the first sequence in the query file. So I get a list of IDs but only for the first human seq query, still another 300 or so un-blasted!. I can guess that there is something wrong with my loop/code structure, but I can't figure it out!
Also, I would like to just extract the Accession IDs of the top hits, so I guess the hit with the lowest 'e' value or highest score. I'm not sure how to do this. Thanks
#!/usr/bin/perl -w
use strict;
use Bio::Tools::Run::StandAloneBlast;
use Bio::SearchIO;
use Bio::SeqIO;
my $in_file = human_seqs.fasta";
my $virus_file = DB.fasta";
my @params = (-p => 'blastp', -d => 'DB_proteins.fasta', -o => 'report.bls', -e => '10' );
my $seqio_object = Bio::SeqIO->new(-file => $in_file);
my $virus_seqio_object = Bio::SeqIO->new(-file => $virus_file);
my $factory = Bio::Tools::Run::StandAloneBlast->new(@params);
my $blast_report = "";
open(my $out, '>', 'IDs.txt') or die "failed to open output for write: $!"; # open output file for saving accession Ids
while (my $seq = $seqio_object->next_seq){
print "Blasting: ", $seq->id, "...\n";
$blast_report = $factory->blastall($seq);
while (my $result = $blast_report->next_result){
while (my $hit = $result->next_hit){
$hit->description =~ m/\[(.*)\]/;
my $virus = $1;
print $hit->name(), "\t", $hit->significance(), "\t", $hit-> accession(),"\t", $virus, "\n";
my $accession_ids= $hit-> accession();
print { $out } $accession_ids,"\n"; # save accession IDs to file IDs.txt for passing to fastacmd
}
}
close($out);
system("fastacmd -d DB.fasta -i IDs.txt -p T -o full_seqs_hits.fasta"); # use fastacmd via the cmd line to pullout full seqs
exit;
}
and what happens when you manually remove the first sequence from the query file? the second one returns a result? looks like you just need more debugging statements and you'll get to the bottom of this
Hi James! Please don't forget to accept the answer that solved your problem.