Changing Temporary File Location for Local BLASTP to nr Database
0
1
Entering edit mode
22 months ago

I am attempting to perform a local blastp against the pre-indexed version of the entire protein nr database using an Ubuntu instance. The problem I am having is that the temporary file location is not going onto a storage drive that I have, and I run out of memory very quickly even though input and output directories are on large storage drives.

How do we change the buffer/temp folder for the blastp command and where might this be located on the system?

Here is the example code I used: Downloading pre-indexed version of nr:

sudo perl update_blastdb.pl --passive --decompress nr

Searching nr with query protein fasta file:

blastp -query /mnt/myquery.fasta -db /mnt/blastdb/nr -outfmt 6 -out ./myresults.out

Where is the blastp command:

kostrouchov@myip:~$ which blastp
/usr/bin/blastp
protein blast nr • 1.2k views
ADD COMMENT
0
Entering edit mode

It is unclear if you are having a problem while downloading the indexes or in actual run. nr is over 100G of files.

ADD REPLY
0
Entering edit mode

Yes, the download of pre-indexed nr was successful to a mounted storage location. Running blastp to search the nr database overloads the /dev/root/ storage (of 100GB) which must be the temporary file location for the command. My instance contains 512GB RAM.

ADD REPLY
1
Entering edit mode

First of all, memory and storage are different things. If you run out of storage, you have to ask the sysadmin to allocate more for your needs. If that's a shared machine you are working on, it could be that your coworkers are hammering onto the same network drive as you and it is just painfully slow or full.

What you can always do is, split the queryfile into smaller ones and run them separated, that could reduce your memory usage. Furthermore, if you run out of memory, the kernel will utilize the swap-storage. Check if your system has any swap-file: swapon -s. If thats not enough or not allocated at all, allocate desired size! There are entire guides out there on how much swap you should allocate. :

sudo dd if=/dev/zero of=/swapfile bs=1M count=32768 (32768 blocs of 1MiB = 32GiB)

Another way is, to adjust the BATCH_SIZE during the blastrun. But I have never done this before. Here you can check out how to optimize memory-usage during a blast run. blast memory usage

Finally on a unix machine the default temporary directory is /tmp. But if blast runs out of memory during the run, the tmp-files wont be in your temporary directory but rather in the swap space.

Anyway: Try to check how much space is occupied in /tmp: du -hc --max-depth=0 /tmp. Check how much space is left on your disk: df -h /tmp/

And what always could help is running htop. This tool shows memory-, system-, swap-usage and more in real time. So run your blast query in tmux Then run htop to see what is causing the error.

ADD REPLY
1
Entering edit mode

A work around I found is to download the full nr.gz fasta from ncbi and use diamond blastp as they have documentation on changing the temporary file location "--tmpdir" and batch size "--index-chunks --block-size":

sudo diamond blastp --db /mnt/myprots.fasta --query /mnt/nr.gz --out /mnt/dmd_res.out --outfmt 6 --tmpdir /mnt/tmp/ --index-chunks 1 --block-size 10
ADD REPLY

Login before adding your answer.

Traffic: 2623 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6