Question

PBS script on cluster

4

Entering edit mode

8.1 years ago

mary99 ▴ 80

Hello all ,

I got several scripts in PBS format to first undrestand how they work and then apply them to new data.while I am not familar with PBS script.I want to know mainly why we use PBS and we just need to run the script (input included in script) or we need to run it on input out of script?I am sorry if my question is so basic.if you provide me a simple sample it would be helpful. Thanks in advance

Cluster script • 2.8k views

ADD COMMENT • link updated 8.1 years ago by Devon Ryan 104k • written 8.1 years ago by mary99 ▴ 80

score 7 · Answer 1 · 2016-10-14

This isn't really the right forum for this sort of question, but I'll take pity because the right one (stackoverflow) isn't exactly the most welcoming of places.

PBS is what's called a "distributed resource manager". In most research institutes, there are large computer clusters, comprised of a number of individual computers (nodes) having various resources (e.g., different amounts of memory, speeds of processors, or number of cores). You can't manually access any of these "nodes", rather you tell PBS what you want to do ("submit a job to PBS") and give it a hopefully accurate representation of what sorts of resources you want (the number of cores is probably the most important thing for you) and it will then have what you want done on an appropriate node. Importantly, this allows for many many many more jobs to be scheduled than there are available resources and for a variety of prioritization methods to be used to determine in which order different jobs can be run. This methodology also allows for accounting of how much time/resources individuals or groups are using (possibly charging them accordingly).

There are MANY different "distributed resource managers" besides PBS, though it's certainly one of the more common ones.

Regarding exactly how to use PBS, you should receive training from the cluster administrator or someone else from IT. They can instruct you on the queues to use, how accounting is done, basic commands, and give you important information about where things are on your cluster.

BTW, for actually understanding the scripts, note that they're mostly just bash scripts, so head over to the wikipedia article if you're not already familiar with it.

score 4 · Answer 2 · 2016-10-14

A PBS script will contain all the information that you would need to run a specific job on a cluster that uses PBS job scheduler.

Generally at the beginning of the script there will be options about time, memory, cores etc The program (and its options) will be towards the end of the script. Generally you submit this file to PBS scheduler by running qsub script.sh

If you search the web with "pbs script" you should see plenty of help pages. Here is one example.

score 3 · Answer 3 · 2016-10-14

Imagine two people want to run their programs in the same computer. However, both programs would struggle to run because resources have not been distributed correctly and are likely to crash. PBS allow us to pre-establish what resources you need to run your program, and can be used so that people work within the same computer without any conflict.

Now that that's out of the way, let me provide an example of a PBS script:

#PBS -N TEST    
#PBS -l nodes=1:ppn=8,vmem=20gb
#PBS -l walltime=48:00:00
#PBS -q default
#PBS -V
cd /Directory
command input.txt > out.txt

First line with the command -N is the name of the process that will be displayed when consulted the general queue. The Second line indicates what resources I need, 1 node (1 computer), 8 threads and 20 gb of virtual memory. The next line says that my job should at 2 days at the most. The next line indicates in which queue the process is going to be ran, in this case the default queue, may change depending on how your cluster is configured. Next line indicates PBS to use all the variables used in your environment in the process. Next is a change of directory, because you get redirectioned to your home directory after sending your job to the queue, and last but not least, all the commands to be run during the job.

If you do want to run it by typing input, I suggest you use an interactive session like this:

qsub -I -V -q test -l nodes=1:ppn=4,mem=8g

However this is not highly desirable, and might be limited depending on how your cluster is configured