Hello everyone,
It is possible to visualize the progress of a local blast+ (verbose mode or something [maybe python ou perl])?
Thanks in advance!
Hello everyone,
It is possible to visualize the progress of a local blast+ (verbose mode or something [maybe python ou perl])?
Thanks in advance!
I don't think so, but if I'm wrong, I would be happy because I miss it too. What I'm doing now is (works with a tab delimited output and a multifasta file query) to find the last query ID written in the blast output, and then to find it into the fasta file. This works only because sequences are "blasted" in the same order as in the query file.
## 1st step
tail -1 out.tab.blast | cut -f 1 > lastqueryblasted
## 2nd step
grep -n -f lastqueryblasted query.fasta
# res1, ex: 23000
## last step, pourcentage computing
wc -l query.fasta
# res2, ex: 33000
# your (under)estimated pourcentage
echo "(23000 / 33000) *100" | bc -l
This is not very convenient I know, but it helps.
EDIT: I use that so much often, that I eventually wrapped it in a script.
If you name it blast_monitor.sh
, it works like this:
blastab_monitor.sh blastresult.blast queryfasta.fa
Here it is:
#!/bin/sh
### monitor a tab-outputed blast job by giving the approximative % done
blast=$1
query=$2
echo "the blast out is: "$blast
echo "the fasta query is: "$query
echo
curquery=$(tail -1 $blast | cut -f 1)
curline=$(fgrep -n $curquery $query | cut -f 1 -d ':')
nblines=$(wc -l $query | cut -f 1 -d " ")
percent=$(echo "($curline/$nblines) *100" | bc -l | cut -c 1-4)
echo "The blast job is about $percent % done..."
I hope that helps.
input fasta file: input.fasta
regular blast output: blast_out.txt
totalcount=$(grep -c "^>" input.fasta); completed=$(grep -c "^Query=" blast_out.txt) ; percent=100*$completed/$totalcount ; echo $percent | bc -l
Here is a cross-platform solution in bash. The script indicates the percentage of rows in the input file consumed by blast in real time, which is a very good proxy of the process progress when the query is a large set of sequences. It's based on the idea that the input can be piped to blast, and the piping command can output the current status to a different stream.
Use as: blastprogress <your normal blast command with full set of options>
E.g.: blastprogress blastn -query my.fasta -num_threads 3 -outfmt 6 -evalue 1e-5 -db nr1 -out my.blast
blastprogress:
#!/bin/bash die() { echo $1 >&2 exit 1 } ## test whether the command is correct ## [[ "$1" =~ blast* ]] || die "Not a blast program '$1'" command -v "$1" >/dev/null 2>&1 || die "'$1' not found" ## grasp the -query argument to replace it with owr pipe ## for ((j=$#;j>0;j--)); do if [ "${!j}" == '-query' ]; then i=$((j-1)); k=$((j+1)); l=$((j+2)) query=${!k} set -- "${@:1:i}" "${@:l}" break fi done ## validate the query ## [ -f "$query" ] || die 'Input file not found' lines=$(wc -l < "$query") ((lines>0)) || die 'Input file is empty' ## we need these two strings to plot the progress bar ## bar='====================================================================================================' blk=' ' echo "Lines consumed:" >&2 printf '[%.*s] %d %%\r' 100 "$blk" 0 >&2 ## ival is the number of rows corresponding to 1% ## ival=$((lines/100)) ((ival==0)) && ival=1 ## we use awk to monitor the number of lines consumed by blast ## awk='{ print } NR%'$ival'==0 { p=sprintf("%.f", NR*100/'$lines'); system("printf '"'[%.*s%.*s] %d %%\r'"' "p" '"'$bar'"' "(100-p)" '"'$blk'"' "p" >&2"); }' ## run blast ## eval "$@" -query <(awk "$awk" "$query") echo >&2 echo 'Done' >&2
On unix systems you could use the "watch" command to cat for instance the tail of your output file to screen. You can make it more fancy by including a grep to just show the query ID and calculate the % like RM just suggested.
A simple solution when you only have one hit returned per query in tabular form is that you can grep the line count.
watch tail -n 10 blast_out
Here's a kotlin solution using kscript. The logic to estimate the progress is similar to the other solutions but it's shorter and more portable (since it java-based).
blast_progress(){
kscript - $* <<"EOF"
//DEPS de.mpicbg.scicomp:kutils:0.3
//KOTLIN_OPTS -J-Xmx5g
import de.mpicbg.scicomp.bioinfo.openFasta
import java.io.File
import kotlin.system.exitProcess
if(args.size == 0 ){
System.err.println("Usage: blast_progres <fasta> <blastresults>")
exitProcess(-1)
}
val fastaFile= File(args[0])
val blastResults= File(args[1])
val fastaIds = openFasta(fastaFile).map { it.id }
val procIds = blastResults.useLines { it.map{ it.split("\t")[0]}.distinct().toList()}
val pcDone = procIds.size.toDouble()/fastaIds.size
println("Approximately ${pcDone} % of ${fastaIds.size} sequences were processed by blast.")
EOF
}
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Sorry for the delay, I liked RM's approach but there is a problem: the blast output file does not update itself until the program stop working, therefore it's not possible to retrieve progress information in real time.
Any suggestions?
instead of -o outputfile ; use > outputfile ; this might get updated in real time;
That did the trick :)
Update: I tried the script with an html output, but it didn't worked. Changing # grep -c "^Query= # to # grep -c "Query= # corrects that. I tried other outfmt options, but no luck :(
The script will work on default blast+ output (outfmt=0) and html format.
Also, I changed the end of script a bit too:
totalcount=$(grep -c "^>" input.fasta); completed=$(grep -c "^Query=" blast_out.txt) ; percent=100*$completed/$totalcount ; total=$(echo $percent | bc); echo From $totalcount sequences, $completed were processed '('$total%')' :')'
Thanks for the suport!