multiprocesses vs multithreads in application
1
0
Entering edit mode
7.3 years ago

Hi,

I know little about writing a tool. Sorry if my question is too obvious. I am still confused after google search .

I know that selecting the same number of cores that an application (assuming there are enough cores available on the machine) can support will get the application running at its maximum performance. Then how about the multi-processes supporting applications?

Specifically, if an application supports multi-processes (from the documentation there is no information about whether or not it supports multi-threads, and no information about the upper limit of multi-processes), and the machine has 64GB memory and 16 cores (a thread per core), then how to set the option -p processes to get the application running at its maximum performance with available resources?

Any help would be greatly appreciated!

software error • 2.5k views
ADD COMMENT
0
Entering edit mode

I'm sorry, I don't mean to be rude, truly, but unless you know if your application supports multicore processors and shared memory, this question seems unanswerable.

ADD REPLY
0
Entering edit mode

Thanks Alex. From the application's documentation, there is no multi-threads option. I'd better check with the developer. An application supporting multi-processes does not necessarily support multi-cores, does it?

And for the multi-processes option, will the application be quicker with -p 3 than with -p 1 (with the same other arguments)?

Thank you.

ADD REPLY
0
Entering edit mode

If whatever analysis you are doing allows for it you could always brute force parallelize e.g. by splitting your input fastq files into multiple pieces and starting multiple since core jobs. Result files can then be combined into one (e.g. with alignments).

ADD REPLY
0
Entering edit mode

Thanks for all your reply. I understand the concepts thread, processor and process now.

The application I am using is working on an BAM file, doing a series of analysis on each site. These series of analysis on each site are performed by 3rd party tools (samtools, bedtools ....) plus functions written by the developer. I guess the -p process (split into multiple processes) for the application is different from what we are talking. Does it mean that different 3rd party tools can be run on different sites simultaneously? Specifically, samtools is working on one site by a processor, and bedtools working on another site by another processor simultaneously. If this is the case, I cannot see advantages over just one process. One process here, I guess, means that the same tool is working on different sites by different processors at the same time.

Sorry if the above seems confused.

Another question is why the CPU usage for each copy of the application can be above 50%? (I ran two copies of the application). The sum of CPU usage is above 100%.

PID USER %CPU %MEM COMMAND

14004 root 95.3 3.6 python

21521 root 94.7 3.6 python

ADD REPLY
0
Entering edit mode

Usage of a core is reported independently so multiple cores can add up to several hundred % (e.g. 8 cores would be 800% when fully used). For example

PID USER  PR  NI    VIRT    RES    SHR S  %CPU %MEM    TIME+ COMMAND
125434  x  20  0  0.491t 0.454t 182428 S 366.7 15.5   8580:20 XX                     
125521  x 20   0  0.491t 0.446t 182432 S 361.4 15.3   8536:57 XX
ADD REPLY
0
Entering edit mode

So Does it mean that both commands each use 4 CPUs or what?

ADD REPLY
1
Entering edit mode

Yes. That is correct.

ADD REPLY
0
Entering edit mode

jing.mengrabbit : Please use ADD REPLY/ADD COMMENT to respond to existing posts to keep the threads logically organized.

ADD REPLY
0
Entering edit mode

Thanks for your advice. I will follow it.

ADD REPLY
4
Entering edit mode
7.3 years ago
kloetzl ★ 1.1k

Let's get our terminology right: A core is just a fancy name for processors residing on the same chip. A process is the running copy of a program and a thread is a subprocess sharing (some) memory. With hyper-threading a single core can execute multiple threads simultaneously; these are then called logical processors. So multi-core and multi-processors is a hardware feature, whereas multi-threading is a software feature.

Btw, if you think you CPU has 8 cores, it most likely has just four plus hyper-threading.

I know that selecting the same number of cores that an application can support will get the application running at its maximum performance.

This is an oversimplification, but should be true in general. Good programs will detect the number of available (logical) processors and use that many threads.

Then how about the multi-processes supporting applications?

"multi-processes" is something very different from "multi-threading" and is much harder to pull off. Supporting "multi-processors" is done by "multi-threading". It's confusing, I know. ☺

Specifically, if an application supports multi-processes (from the documentation there is no information about whether or not it supports multi-threads, and no information about the upper limit of multi-processes), and the machine has 64GB memory and 16 cores (…), then how to set the option -p processes to get the application running at its maximum performance with available resources?

My guess: the documentation does not use the terms I defined above. The simplest thing to do here is to open up your system monitor. Start the program with prog -p 16 and see if all your CPUs are being used (after it has done reading in files). If your tool simply splits into 16 threads it should show up as one process using 100% of your CPU. If you can see the number of threads, that should be 16.

If somehow -p 16 does not give you the desired result, you should strive for the smallest amount of threads that uses 100% of your CPU.

ADD COMMENT
1
Entering edit mode

Warning: -p 16 might hog all the system resources, crawling the kernel and the OS. For test, even -p 14 or smaller values can do the job.

+1 for a descriptive answer. I would like to add that in complex programs, it's not always obvious how much speedup you will get using more cores, and more often than not the speedups might not be linear according to cores. There are parts of the programs where parallelization is not possible, and this will essentially determine the final speedups. Also depending upon data parallelization, it might be possible that after a certain number of cores, there is a saturation in the speed up. Essentially, the developers of the program should tell you about the speed ups because they know better how the parallelization has been implemented.

ADD REPLY
1
Entering edit mode

Personally, I never worry about allocating too many threads in Linux... when testing multithreaded processes in Windows, I routinely see the computer become unresponsive if I use all logical cores without explicitly setting a reduced priority (at which point it's hard to kill the process since everything is sluggish). But in Linux, I've never had the issue arise.

ADD REPLY
0
Entering edit mode

Why we do not need to worry about allocating threads in linux? Can linux adjust the resource allocation automatically?

ADD REPLY
1
Entering edit mode

Linux can handle multithreaded processes fine, but it's still up to the authors of a user program to handle threading as well; you can't give a singlethreaded program to Linux and expect Linux to make it multithreaded, any more than you can expect a good mechanic to make a 1-cylinder engine into a 4-cylinder engine (though they can easily change a 4-cylinder engine into a 1-cylinder engine, by simply disabling some of the cylinders). As kloetzl noted, many programs will automatically detect the number of logical processors available and spawn that number of threads. But not all of them will, and that's not always optimal (though it's usually OK), so most multithreaded programs offer thread-count overrides.

ADD REPLY
0
Entering edit mode

Thank you. Where can I see the number of threads? by which command?

ADD REPLY

Login before adding your answer.

Traffic: 2350 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6