Any suggestions for basic intro to submitting jobs to an HPC cluster?
5
4
Entering edit mode
6.2 years ago
m98 ▴ 420

I have been using an HPC cluster for a few years now and regularly need to submit jobs that process large amounts (often over 100) for large files like BAM files etc.

Despite some experience, I feel I am lacking some of the understanding of the basic concepts such as:

  • How to estimate how much RAM and runtime a job will need - I know, it's mostly based on experience and no one can ever answer that for me
  • The relationship between how much RAM you give a job and how much runtime? Are these 2 parameters independent? Will one affect how long you are in the queue more?

So my question is:

Does anyone know of a nice book/online resource that explains these basic concepts and ideas? I find myself struggling to answer these simple questions and the documentation out there is very often geared towards explaining complicated details about how supercomputers work. I am interested in all that but I would like to start with a dummed-down version first that focusses on how to submit jobs properly.

Any ideas? I should say, the cluster I use works with the Sun Grid Engine[SGE] system.

ngs hpc • 3.6k views
ADD COMMENT
2
Entering edit mode

This depends on knowing the tool and quite a bit of trial and error. You can start off with figuring out how to parallelize runs as much as possible, then you optimize the RAM, wall time and number of cores for each parallel chunk, and also optimize the RAM, runtime and number of cores for the master thread.

Start off with 16 GB RAM and 4-8 cores, wall time of 48-72 hours and tune from there. There are a whole lot of variable that go into the process.

ADD REPLY
2
Entering edit mode

I doubt you will find any resource that explains these because its something you have to figure out for yourself through trial & error. Depends on the program you are using and the data you are processing. There are basically 2 approaches: 1) be extremely generous for each job and request more memory and time than you could possible need, or 2) request only the bare minimum memory and time and see if the job completes successfully, if not then bump them up a little and try again.

ADD REPLY
1
Entering edit mode

m93, people have invested time to answer your question.

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Upvote|Bookmark|Accept

ADD REPLY
0
Entering edit mode

The relationship between how much RAM you give a job and how much runtime? Are these 2 parameters independent? Will one affect how long you are in the queue more?

Some programs are written to run by storing data in memory. Other programs are written to run by working on sorted or other predictably-organized data from file streams. Other programs still work best doing a mix of both. It depends on your program and input.

Without knowing what you're doing, this is a tough question to answer with specifics. Yet:

Giving more memory to a program that uses a constant amount of memory will not change how fast it runs. This will just waste memory. However, if you can split the work and run lots of instances of said program, each working concurrently on a small piece of the problem, then more memory will help the overall task complete in less time, because your overall memory use will be, at most, M x N for constant memory cost M and N jobs.

Also, job schedulers will have an easier time moving many small-memory jobs from the wait queue into the run queue, than one monolithic large-memory job, which may need to wait until queue conditions allow allotment of a large chunk of memory.

ADD REPLY
0
Entering edit mode

Actually, it would help the forum, if you could post few tips/suggestions as you have years of experience in submitting bioinformatics jobs to HPC cluster. Take dummy data or public domain data and walk us through till the end. At least, point to your blog/github repo for scripts. m93

ADD REPLY
5
Entering edit mode
6.2 years ago
pld 5.1k
  1. Nope, you just guess, then optimize.
  2. Nope, not usually. I have a few very lazy scripts that use a ton of RAM (I have it, why not?) they're fast, some things use very little ram and can take a relatively long time.

So, if your goal was to come up with some metric to allocate your job wall time limits on the RAM for the job, don't, it won't work.

I find myself struggling to answer these simple questions and the documentation out there is very often geared towards explaining complicated details about how supercomputers work.

I think you sort of answered it yourself, you really only need to dig into that documentation if you're doing things that really rely on the intricate details of a cluster. Just like with coding, don't worry about optimizing till you need to.

Here are some tips:

  • Don't worry about over allocation of resources. As long as you're not allocating 50 128 core/1TB RAM nodes to run 1 instance of gzip each, you'll be fine. If you take a guess that a job needs 64GB and you only use 47, its not the end of the world. If you ask for 8 cores and only manage to keep 6 loaded, oh well, next time use less.

  • Considering most systems kill jobs that exceed their allocated
    resources, it doesn't hurt to round up. If you undershoot and the job gets killed, you've still wasted wall time. "Wasting" some ram and
    some cores is better than restarting an 8 hour job because you needed 64 and only asked for 48.

  • Over-allocating wall time is even less of an issue, the resources are free once the job is done, if it takes 1 hour or 1 month. All that
    happens is your jobs may have less priority over shorter ones.

  • Typically programs that rely on a database need that much memory to
    load the database. So, unless you know the DB stays on the disk, a
    good idea is adding the size of the query and database and
    multiplying by 1.25. Tweak accordingly.

  • Small, short jobs take priority over long, large jobs.
  • Don't constantly undershoot your wall time, you'll either piss off
    the admins with constant requests for increases or constantly waste
    time having to resubmit.
  • Make good use of the scratch space on nodes and be careful about
    bogging down the file system by having a ton of jobs trying to I/O
    the same files or directories. If jobs need to constantly access
    something (say an index for HiSat2), copy it to the scratch, and have the job read it from there.
  • Interactive sessions are great. Use them to debug and benchmark jobs. This is a great tool to find out little things that get you in
    trouble, like a program starting more threads than the CPUs you
    allocated for it.
  • Different programs behave differently as you add cores. Some don't have substantial memory increases, some do. Experiment to find out.
  • When in doubt, ask the admins. They know the system and may have helped another user do something similar.

Try stuff, break stuff, bother the admins. Your goal is to get work done, the cluster lets you do if faster, you're not there to optimize HPC software. A cluster is a tool, use it like a hammer.

ADD COMMENT
3
Entering edit mode
6.2 years ago

Mostly this is based on what you're doing/using. Different software has vastly different requirements. Some aligners (STAR) will basically eat all the RAM they can get their hands on, as it makes their job easier. There's a bare minimum you'll need to allot for them to run, but they can adjust to memory constraints to an extent. That said, providing them all the resources they need will ultimately make your life easier, particularly if you want to run many samples in parallel from the same job.

A lot of other software won't be able to adjust however and will just crash. So it's usually easiest to figure out how long it takes for an average sample/run/whatever locally and scale off that. I also recommend submitting multiple jobs that each deal with one/a handful of samples for a few reasons:

  1. Makes errors/crashes a lot easier to track and reduces the amount of time to re-run.
  2. Jobs will spend less time in queue if your cluster gets heavy traffic, as many in academia do. It's a lot easier to run 50 single processor, 2 GB memory, 4 hour jobs than 1 job that will probably end up in a queue meant for pipelines that take special resources or a particularly long time to run.
  3. It's easy to write a script that will automate job submission for you, meaning you still really only have to write one submission script.

I don't have any literature recommendations, but hopefully somebody else will weigh in. Also, asking the cluster admin is usually worthwhile - they tend to be helpful and enjoy helping people learn to better utilize their system. Some institutions offer workshops that help people deal with the exact questions you have, so it might be worth asking around your institution to see if they're offered (or to suggest that they be).

ADD COMMENT
3
Entering edit mode
6.2 years ago
h.mon 35k

As already stated in the comments above, there is a lot of trial and error. You can keep track of your jobs resources usage using GNU Time, RunLim or pyrunlim, then you can try to predict future resource usage by examining past jobs. Also, the cluster administrators may have some experience to share, you may try contacting them.

One complication is that some some tasks require a more or less predictable amount of memory - like most short read mappers, where memory usage depends on genome index size - and others are harder to predict - like de novo assembly, or even index building for short read mappers.

Regarding memory vs run time, some programs will crash if not enough RAM, others will use disk but execution will be significantly slower - disk IO is much slower than RAM access.

ADD COMMENT
3
Entering edit mode
6.2 years ago
  1. Your estimates should be derived from empirical evidence. SGE has qacct which should tell you how much memory was consumed at peak for a completed process.
  2. A memory-bound process can certainly be slowed by lack of memory, but it's more likely to just crash. Computers are great at managing limited CPU but poor at managing limited memory. Another thing that can happen is that you see "thrashing", where CPU usage will spike when memory runs out.
ADD COMMENT
2
Entering edit mode
6.2 years ago
GenoMax 147k

How to estimate how much RAM and runtime a job will need

This information is going to come from looking at the documentation that comes with the software (not always a sure thing) and perusing the forums. You will have to start with a minimum recommended amount but throwing the kitchen sink at a job will end up just wasting resources. You could use a sub-sample of your data to get an idea of run-time. A couple hundred thousand reads may be sufficient to get a rough idea of what the run time may look like. With programs that support threading/multiple CPU's you will see some speed-up in execution time but it likely will not be linear. There may be some nuances as to what changes you may need to do in RAM allocation with multiple cores but it will likely be very program dependent.

The relationship between how much RAM you give a job and how much runtime? Are these 2 parameters independent? Will one affect how long you are in the queue more?

Not predictable, not completely and yes (in that order).

Rule of thumb is to allocate minimum recommended amount of RAM (remember there is no substitute for actual RAM) + 10% to account for overheads, different configuration of your cluster etc. Some programs will page data to local storage if enough RAM is not available so that will increase the run-time. On other hand, if you have a TB of RAM available then you could read the entire nr blast index into memory and speed searches up. Most places have fewer nodes/job slots with access to lots of RAM, so you would likely be waiting in the queue longer with large memory requirement jobs.

I have been using an HPC cluster for a few years now and regularly need to submit jobs that process large amounts (often over 100) for large files like BAM files etc.
Despite some experience, I feel I am lacking some of the understanding of the basic concepts

I find that a bit surprising. Perhaps you are being modest or are truthfully recognizing a deficiency. What you have been doing so far has probably got you half-way there. Talking with your local fellow power users/sys admins would be an excellent way to re-mediate this deficiency. If you have not had a sys admin get on your case for doing something "out of bounds" on your cluster then you have not anywhere close to what is possible/acceptable!

Always remember to experiment with a couple of samples first before trying to start jobs with 100s.

ADD COMMENT

Login before adding your answer.

Traffic: 1913 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6