Why BLAST runs slow on windows PC than linux. What is msdos maximum core/RAM limit?
1
1
Entering edit mode
8.7 years ago

I want to run blast that can run in linux as well as windows. However in windows it runs through commandline (cmd=msdos). I can define number of processors in parameters but still on windows it seems to bit slow. Is there any limit on MSDOS regarding maximum RAM or max processors used.

I want to run standalone blast with my downloaded copy of NCBInr database. I have 6 cores=12 logical processors and 50GB RAM (extendable to 256GB) in windows while I have 10 core=10 processors and 200 GB RAM in linux.

I have already running it in windows with a query size of 40000 sequences. It is running since a month and we still have only 15000 sequences in output file. We have defined blast with max hit of 1 and in tab format, so as to minimize the time. We still have no clue, how much time it might take.

Is there anyway to speed up the process or any form of blast where we can know the progress of the task. We dont have financial support and thus cloud blast is not an option for us.

blast ncbi linux windows cmd • 3.3k views
ADD COMMENT
1
Entering edit mode

(assuming the windows version is 64bit)

It is unlikely to be about the OS, and more about the hardware :(

10 cores is a special number - its probably a new-ish Xeon Ivy Bridge chip. That will definitely have hyperthreading (for 20 virtual cores), huge amounts of cache-per-cpu - not to mention 4x more general memory. Just the CPU of that Linux machine probably costs as much as the entire windows computer.

But you can of course test this by booting the windows machine from a Linux live CD/DVD, and seeing if blast runs significantly faster there than it did on windows. Linux is free, after all -- finances should not be an issue :)

ADD REPLY
1
Entering edit mode

The 24 core processor PC has been divided in Linux and Windows by Hyper-V. Main OS is windows. Thats how by shutting linux we can extend windows RAM.

ADD REPLY
1
Entering edit mode

hehhehe , ok... that might have had some relevance to your question you know ;)

I still don't understand what combination of 6, 10 and 12 makes 24, but for sure it would now seem like either a RAM thing, a 32bit thing, or a blast software thing. I appreciate that isn't a particularly helpful answer.

ADD REPLY
1
Entering edit mode

The main system has 24 processors. Several are hidden and/or being used by hyper-v itself. I am showing the details as per the system properties on windows and Linux. So window server is showing 12 processor and Linux is showing 10 processor

ADD REPLY
0
Entering edit mode

It would have probably been faster to split the input file in half and run 2 parallel num_threads 6 blasts. I think blast queries sequences in order, so to see how many sequences you have already processed, you can:

tail -n 1 yourTabularBlastOutputFile

And then see from your query file how many sequences are before it..

ADD REPLY
0
Entering edit mode

Does that mean Windows commandline cannot use all available cores and RAM!!!

ADD REPLY
0
Entering edit mode

It means that blast does not benefit a lot from multi-threading after a few threads. I usually split large input files and run numerous parallel num_threads 8 jobs on our cluster.. See e.g. this. If you have the RAM, the best with e.g. 12 cores might be 12 parallel num_threads 1 jobs or 6 parallel num_threads 2 jobs instead of 1 num_threads 12 job. But yeah, this stuff depends of available RAM, cache size, I/O disk speed, etc.

ADD REPLY
1
Entering edit mode

I am sure the test is on linux. As far as I know the commandline of windows (DOS) cannot use more than 1 processor. I lost the reference of the fact. So I am just confirming.

ADD REPLY

Login before adding your answer.

Traffic: 2443 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6