I want to run blast that can run in linux as well as windows. However in windows it runs through commandline (cmd=msdos). I can define number of processors in parameters but still on windows it seems to bit slow. Is there any limit on MSDOS regarding maximum RAM or max processors used.
I want to run standalone blast with my downloaded copy of NCBInr database. I have 6 cores=12 logical processors and 50GB RAM (extendable to 256GB) in windows while I have 10 core=10 processors and 200 GB RAM in linux.
I have already running it in windows with a query size of 40000 sequences. It is running since a month and we still have only 15000 sequences in output file. We have defined blast with max hit of 1 and in tab format, so as to minimize the time. We still have no clue, how much time it might take.
Is there anyway to speed up the process or any form of blast where we can know the progress of the task. We dont have financial support and thus cloud blast is not an option for us.
(assuming the windows version is 64bit)
It is unlikely to be about the OS, and more about the hardware :(
10 cores is a special number - its probably a new-ish Xeon Ivy Bridge chip. That will definitely have hyperthreading (for 20 virtual cores), huge amounts of cache-per-cpu - not to mention 4x more general memory. Just the CPU of that Linux machine probably costs as much as the entire windows computer.
But you can of course test this by booting the windows machine from a Linux live CD/DVD, and seeing if blast runs significantly faster there than it did on windows. Linux is free, after all -- finances should not be an issue :)
The 24 core processor PC has been divided in Linux and Windows by Hyper-V. Main OS is windows. Thats how by shutting linux we can extend windows RAM.
hehhehe , ok... that might have had some relevance to your question you know ;)
I still don't understand what combination of 6, 10 and 12 makes 24, but for sure it would now seem like either a RAM thing, a 32bit thing, or a blast software thing. I appreciate that isn't a particularly helpful answer.
The main system has 24 processors. Several are hidden and/or being used by hyper-v itself. I am showing the details as per the system properties on windows and Linux. So window server is showing 12 processor and Linux is showing 10 processor
It would have probably been faster to split the input file in half and run 2 parallel num_threads 6 blasts. I think blast queries sequences in order, so to see how many sequences you have already processed, you can:
And then see from your query file how many sequences are before it..
Does that mean Windows commandline cannot use all available cores and RAM!!!
It means that blast does not benefit a lot from multi-threading after a few threads. I usually split large input files and run numerous parallel num_threads 8 jobs on our cluster.. See e.g. this. If you have the RAM, the best with e.g. 12 cores might be 12 parallel num_threads 1 jobs or 6 parallel num_threads 2 jobs instead of 1 num_threads 12 job. But yeah, this stuff depends of available RAM, cache size, I/O disk speed, etc.
I am sure the test is on linux. As far as I know the commandline of windows (DOS) cannot use more than 1 processor. I lost the reference of the fact. So I am just confirming.