Entering edit mode
8.8 years ago
hdy
▴
180
I am doing a blastn search on a local machine (4 cores, 8 threads, 8GB mem). My database is NCBI bacteria, archaea, viral refseq, which is about 200GB. What I noticed now is that the mem usage is very high with 7711M used, ~14M unused, the VM usage is 395G. However, my CPU is in very low usage: 1.6% usr, 2.85% sys, ~95% idle. Is this normal?
The blastn command I used is
blastn -db "bacteria_genomic_74 viral_genomic_74 archaea_genomic_74" \
-query input_file \
-out output_file \
-outfmt 6 \
-num_threads 8 \
-max_target_seqs 5
The entry of blastn in the top command says: CPU ~9% #TH 16 MEM 2304K
Sorry cannot get a snapshot of the top command since the computer is running and not responding very quickly. So I do not want to mess with it.
Is
bacteria_genomic_74 viral_genomic_74 archaea_genomic_74
the exact filename prefix of a single blast database or do you have three separate blast databases?three separate ones, bacteria is the largest ~200G, other two is pretty small
If you don't have enough memory, most CPU will be used to handle VM - that's why you have low CPU usage. You need to use less memory. You could split in different databases and blast runs; or you could run cd-hit to reduce the size of your database... And there is probably some blast version (or analogue software) that do not load the whole database into memory - I would look for that
So if the database is very large, for example, whole eukaryotic genomes, what people do is to split databases into small pieces and blast the query multiple times and in the end, merge all the results. Is this right?
Or better just follow BLAST advices:
Go here