Speeding Up The Blast Job
0
0
Entering edit mode
10.7 years ago
User000 ▴ 710

Hello, I am trying to BLASTX my contig database against protein database of UniProt/TrEMBL. It is taking me ~20 min for every contig and I have thousands of them, so in total it is going to take me 2 months! Do you know if there is a way to speed up the blastx job? Note: I am already using clusters and I already split up my contigs into smaller files. This is the command line I use:

blastall -p blastx -e 0.001 -m 8 -S 1 -i input.fasta -d trembl.fasta -o output
• 4.2k views
ADD COMMENT
2
Entering edit mode

looks like you are using legacy blast, did you try blast+? What is your computer setup, how many nodes, CPUs etc? How did you split your contigs? Are all cores already running at 100% load all the time? If not, then there is a tutorial on using GNU parallel with blast on this site: GNU Parallel - parallelize serial command line programs without changing them

ADD REPLY
0
Entering edit mode

I havent tried blats+ yet, since I am facing difficulties to download anything on this computer, so I am basically using what they have already downloaded. I split my 200000 transcripts in 7 files using a python script, I am running a blast in remote mode in background sending them to computer clusters they have at uni..to be able to finish in 6 days I need to split my files in 250 parts..

ADD REPLY
1
Entering edit mode

Some extra info might help. What version of blast are you using? What is the database you're blasting against (nr? is online/offline)? What is the length of your contigs? etc etc

ADD REPLY
0
Entering edit mode

I am using blastall, blasting my plant transcripts (~200000 ns) against TrEMBL (~45 mln of protein seq-s). My contig length vary from ~300 min and ~22000 max, average ~1500 bp.

ADD REPLY
0
Entering edit mode

blastall is the old version? Have you tried any blast+ version (i.e. > 2.2.28+). Is significantly faster and allows for multicore usage.

ADD REPLY
0
Entering edit mode

I am using a debian at University, and it is impossible to download anything there 1)it is so old, needs update 2) I have no access to root. Any other suggestions? of cos If there are no other possible solutions I am gonna do my best to follow the solution you suggested..

ADD REPLY
0
Entering edit mode

Does anyone know how to check the progress, as in how many sequences are done blasting and how many remaining? I output in format 5 xml.

ADD REPLY
0
Entering edit mode

afaik you can't get an exact progress report, because the output is buffered, even worse so with xml output because until the job is finished, it is not well-formed xml. try standard output format instead, I sometimes check progress using grep -ce "Query=" blastout because this occurs exactly once at the start of any result section.

ADD REPLY
0
Entering edit mode

I have a big fasta file with ORFs that I want to blast. If I have 32 cores on my workstation, does it make more sense to split the fasta file and initiate separate blast tasks?

In other words, is it faster to split the fasta into 4 fasta files and assign 8 cores per file using blastp, rather than just gives the entire initial non split fasta files the entire 32 cores?

ADD REPLY
0
Entering edit mode

I cannot exactly explain why, but from my experience the last option is faster, you could run a little benchmark if you want to find out exactly.

ADD REPLY

Login before adding your answer.

Traffic: 1922 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6