Hey all,
I'm trying to process a large transcriptome fasta file with over 50k sequences (just a bit over 53k). I setup a local blast+ instance and the latest nr database (up to nr.25). I started blastx with the following command: blastx -query fasta.fa -out blastx.xml -outfmt 5 -eval 1e-3 -num_threads 32
So far it's processed only 3500 sequences over 2 days. It's a fairly decent workstation, 2x Xeon E2560 V2's with 128GB of ram. From our previous experience this shouldn't take over a few days, although at this rate it seems like it's going to take a long time. The output is also quite large for only 3500 sequences, it's already at 1.5GB.
How can I optimize blastx for importing into blast2go? I'm currently reading up on how to parallelize blastx, but I'm not sure if there are better options out there.
Thanks!
How is that possible? You didn't even define a db. Also, it would make sense to opt for refseq_protein over nr since the non-refseq seqs in nr probably can't be linked to go terms anyway (I could be wrong).
Sorry, I forgot to include that command in this post. I used the nr database in the commands.