Entering edit mode
4.9 years ago
shalinikaushik1293
▴
10
I am running the command for BLAST :
blastp -query /Users/shalini/Desktop/shalini/project/unmodelled_fasta/AAA18895.fasta -outfmt "7 sacc qcovs pident ppos evalue" -db=/Users/shalini/Downloads/nr -out=/Users/shalini/Desktop/shalini/project/blast_resultnr
After running for one and half hour, it results in
Error memory mapping:/Users/shalini/Downloads/nr.79.phr openedFilesCount=251 threadID=0
BLAST Database error: Cannot memory map /Users/shalini/Downloads/nr.79.phr. Number of files opened: 251
I am working on macOS. also I have built the database of nr which results in 114 .phr, .pin, .psq and .pog files.
Please help me, if anyone knows.
I am not sure but seems like memory error, please mention the size of available RAM in system.
It's 16 GB RAM memory in the macOS, I am using
nr
database currently contains 142 files. Not sure why you built your own database instead of downloading the pre-built indexes. Are you using the latestblast+
software?You also should not be using
=
in program options. I am surprised it seems to be working e.g.-db=/Users/shalini/Downloads/nr -out=
Instead of using
bold
please use thecode
button to present your code/errors so they are readable. I've done it for you this time.Thank you!
Thank you for your reply and suggestion I will use code button from the next time. Yes, the command line is running with
=
sign also. I didn't use the pre-built indexes because I am not finding the complete nr database. It is divided into parts (like nr.00.tar.gz, nr.01.tar.gz, etc.). So, I found a link (ftp://ftp.ncbi.nih.gov/blast/db/FASTA/nr.gz
) from where I only got the nr file not it's alias or index file. That's why I built the database using makeblastdb.Yes, I am using latest blast+ software
I posted a link for pre-built
nr
database in my last comment. You need to download all nr files from there (no need to get md5 sum files) and then uncompress them in one directory. You may have an incomplete index (did you check to make sure there were no errors when you built the index?). 16G is probably not enough RAM fornr
searches but if you have a large swap space defined it may work.Even if
=
is working please don't use that method.That means I need to use an external disc for nr database because in future, results will also need space. As I am using BLAST for
~7000
proteins. Yes I am sure, there were no errors when I built the index. Can you please tell me in all the nr database files (you have sent me the link), which one is having the FASTA files because I have downloaded 5 to 6 files and on untaring them, it gives only index files. We require FASTA file to put in the-db
in the command of blastp.No.
-db
has to point to thebasename
of the blast index being used. In this case it isnr
.Using an external spinning disk will slow down everything. With 7000 proteins you will want to use blast options smartly to get data you need. Have you considered using
-remote
option to do the blast remotely at NCBI? You could batch the proteins in groups of 10 or 15.You will have to download ftp://ftp.ncbi.nih.gov/blast/db/nr.00.tar.gz , ftp://ftp.ncbi.nih.gov/blast/db/nr.01.tar.gz, ftp://ftp.ncbi.nih.gov/blast/db/nr.02.tar.gz ......... ftp://ftp.ncbi.nih.gov/blast/db/nr.142.tar.gz all these file. Extract all of them and then run your command. Just provide the path of this directory where you extracted these 143 pre-built blast database along with basename. For instance if you extract these gz files into /Users/shalini/Downloads/ directory than in command line write this "-db=/Users/shalini/Downloads/nr". However coming to your main issue, I dont think you will be able to run blast with your current system configuration, as this one has only 16GB RAM, and BLAST at least requires 32-64 RAM. However if you really want to do it on your current system. Than what you can do is, you can download the Fasta file for NR database from the link genomax provide. Than make small subsets of this fasta file, make blast database for each subset and than you can run blast with each subset. But here is a small issue, you will face in this approach is that your E-value will be highly impacted, as your search space is reduced due to subset of fasta file.
Thank you @prince26121991 for the help.