Question

Process 50k+ sequences with Blastx for Blast2Go

2

Entering edit mode

11.0 years ago

satshil.r ▴ 50

Hey all,

I'm trying to process a large transcriptome fasta file with over 50k sequences (just a bit over 53k). I setup a local blast+ instance and the latest nr database (up to nr.25). I started blastx with the following command: blastx -query fasta.fa -out blastx.xml -outfmt 5 -eval 1e-3 -num_threads 32

So far it's processed only 3500 sequences over 2 days. It's a fairly decent workstation, 2x Xeon E2560 V2's with 128GB of ram. From our previous experience this shouldn't take over a few days, although at this rate it seems like it's going to take a long time. The output is also quite large for only 3500 sequences, it's already at 1.5GB.

How can I optimize blastx for importing into blast2go? I'm currently reading up on how to parallelize blastx, but I'm not sure if there are better options out there.

Thanks!

blastall blast2go blastx • 4.4k views

ADD COMMENT • link updated 3.7 years ago by Ram 45k • written 11.0 years ago by satshil.r ▴ 50

0

Entering edit mode

I started blastx with the following command: blastx -query fasta.fa -out blastx.xml -outfmt 5 -eval 1e-3 -num_threads 32

How is that possible? You didn't even define a db. Also, it would make sense to opt for refseq_protein over nr since the non-refseq seqs in nr probably can't be linked to go terms anyway (I could be wrong).

ADD REPLY • link updated 3.7 years ago by Ram 45k • written 11.0 years ago by 5heikki 11k

0

Entering edit mode

Sorry, I forgot to include that command in this post. I used the nr database in the commands.

ADD REPLY • link updated 3.7 years ago by Ram 45k • written 11.0 years ago by satshil.r ▴ 50

Ram · Answer 1 · 2014-07-16

1

Entering edit mode

11.0 years ago

rtliu ★ 2.2k

My suggestion based on page 10 of blast2go manual

use -max_target_seqs 10
use -word_size 5 (default 3, more sensitive but slower)

It is a big task, be patient or find to a cluster like TACC. (https://wikis.utexas.edu/display/bioiteam/split_blast)

ADD COMMENT • link updated 3.7 years ago by Ram 45k • written 11.0 years ago by rtliu ★ 2.2k

0

Entering edit mode

Thanks for the help!

How long do you expect it should take to finish a job like this?

ADD REPLY • link 11.0 years ago by satshil.r ▴ 50

0

Entering edit mode

Hard to say, maybe 5-10 weeks if the workstation is dedicated to the blastx search.

ADD REPLY • link 11.0 years ago by rtliu ★ 2.2k