Question

Wait For A Job To Complete And Give Results Before Starting A New One

1

Entering edit mode

12.4 years ago

jamespower371 ▴ 40

Hi, I need to start a new job once a loop has produced results, the loop is in R and calls the 'system' command to submit my jobs from R, submitting many jobs for each element of the loop, and outputting results files. Is there a way to then submit a new job using the results from the loop, after they have been created, either in R or BASH? Thank you for your help.

EDIT

Thank you Leonor. I would like to start the script after the whole loop and all the results files from the loop have completed, is there a way to do this? For example:

       for (file in names) {
          system(paste('qsub script.file.', 'names'))
      }

With each script in script.file.names containing a job using 'Rscript'. I would just like to know when all the results from this loop appear in the folders, so that I can work with these after all jobs have completed…

r bash programming • 9.1k views

ADD COMMENT • link updated 7.0 years ago by steve ★ 3.5k • written 12.4 years ago by jamespower371 ▴ 40

0

Entering edit mode

I would suspect that the system commands would have a handle with which your code will be inside of the for-loop until the system commands are done. So, just writing your next set of scripts with the file names that were created in the for-loop immediately after the loop should do it. Did you verify this?

ADD REPLY • link 12.4 years ago by Arun 2.4k

0

Entering edit mode

Just edited your question to add the details you had provided below. Below is the space for "answers", so it is better to edit your question than to add the information as an answer. I've edited my answer to fit your question.

ADD REPLY • link 12.4 years ago by Leonor Palmeira 3.9k

0

Entering edit mode

Closed, as this is not a bioinformatics question.

ADD REPLY • link 12.4 years ago by Leonor Palmeira 3.9k

0

Entering edit mode

I bet a lot of people use this kind of thing in bioinformatics all the time, and the answer would be instructive to them. If a question is considered "not bioinformatics" by a moderator, or an answerer, they could easily make it more related by mentioning an application or context or example in their answer, rather than simply close the question and snuff the discussion on something that does have utility in bioinformatics.

ADD REPLY • link 12.4 years ago by seidel 11k

0

Entering edit mode

Learning how to search for the answer to your own question is a crucial learning. This question, for instance, is linked to 'learning how to use qsub' and has (sorry about that) nothing specifically to do with bioinformatics. I did take the time to answer, however, so I didn't close it leaving the OP in the dark, just referred him to another resource.

ADD REPLY • link 12.4 years ago by Leonor Palmeira 3.9k

0

Entering edit mode

You seem to be saying that qsub, or the ability to submit jobs after a loop, has no utility or relevance to bioinformatics and shouldn't be discussed here. I beg to differ. If jamespower371 had phrased his question with the word bioinformatics in it, as in "When I do bioinformatics analysis I often....", the content wouldn't have changed, but fewer people would see it as unrelated. I was trying to point out that there are two solutions to the perceived lack of bioinformatics relevance, (1) follow an unimaginative administrative impulse and close the question (I commend you, buy the way, as identifying yourself as the closer, few do this), or (2) add or ask for specific content which connects the dots for people like me, and shows an example of how qsub would be used in BLAST, for example, or like in your answer, how one might connect some bam, sort, index operations. I submit that your answer (the non qsub part, and the qsub part) has the potential to be informative for people doing bioinformatics, and thus belongs here (and would simply be enhanced by some specific content, inserted in either the question or the answer).

ADD REPLY • link 12.4 years ago by seidel 11k

1

Entering edit mode

I am just saying that there are a lot of things you learn along your way of learning bioinformatic skills. And I think it is important to be able to separate these into different pots (technical questions, methodological questions, biological questions, ... this is not an exhaustive list). When you are able to separate these, you have learned something : that there are different levels/aspects in the problems you are dealing with and different experts to answer each one of them. This is also useful in research in general, and will help you to be able to search for the right collaboration.

Let's take the qsub question: being able to determine that it is just a 'qsub' question will lead you to (i) ask your question with precision and detail and (ii) ask the right person (probably stackoverflow for this one). For me, this has nothing to do with adding a 'bioinformatics' context to a technical question. I am able to see where this question is coming from, and I also use 'qsub', but questions such as 'how to merge files by columns' or 'how to grep this pattern in my file' or... shouldn't be posted here, because they (I insist) are not 'bioinformatics questions'. :-)

ADD REPLY • link 12.4 years ago by Leonor Palmeira 3.9k

0

Entering edit mode

Of course, you're right. But you're speaking as someone with the benefit of hindsight, as an expert who can see separations and connections between pots. Novices are often too confused to see them, and an expert can lay them out and make things clear, offer direction. 'qsub' is a good example. I don't know what it does. Looking at this thread I might conclude that it has no utility in bioinformatics research. I've learned nothing about it (and perhaps concluded something wrong). Closing the question doesn't help this at any level. I'm simply arguing for quality over restriction. Increasing the quality of the question (the thread), rather than shutting it down, will help everyone. Experts and novices alike, all trying to do bioinformatics, will read these questions. You may be bored by a question on how to merge files by columns and think it belongs else where, but a novice may not know your cool trick for merging some annotation onto a table of gene expression results (most biologists I know don't know how to do this). A bad question is an opportunity to add useful instructive content for everyone. Closing the question truncates that paradigm.

All of this assumes, of course, that 'qsub' can be used in some way for bioinformatics. I'll admit the numbers are on your side, my experience here is that people (not necessarily you) favor closing down questions rather than adding some good content which could make them better - and get them closer to being a 'bioinformatics question'. :-).

ADD REPLY • link 12.4 years ago by seidel 11k

1

Entering edit mode

It hasn't been long enough since I was a (biology) student that I have forgotten how I behaved when I didn't know something: I would first try to circumvent my problem, figure things out and put things into pots. I don't know why people don't do this, and I still think technical questions should be asked on technical Q&A sites because you will get better answers with details on the ins and outs.

This being said, I think we basically agree, and to prove you my goodwill, I will re-open this question so that others can still contribute to it.

ADD REPLY • link 12.4 years ago by Leonor Palmeira 3.9k

score 2 · Answer 1 · 2012-07-19

If you run a script like this, the second script will only be run once the R command is finished:

#!bin/bash

R --vanilla --no-save << EOF
myRcommands(output="results.txt")
EOF

mySecondScript --input results.txt

If you were running the command interactively in the terminal, you would have to use the && (which forces bash to wait for this command to finish before continuing):

myFirstScript --output results.txt && mySecondScript --input results.txt

EDIT

Of course, now that you have added more detail, I see you are using 'qsub' and this basically changes everything... Because now, the job handling is done by qsub and not within bash. 'qsub' puts your job in queue, and bash is ready for the next command (despite the job not having been run yet).

This question was already answered on stackoverflow : http://stackoverflow.com/questions/3886168/how-to-automatically-run-a-bash-script-when-my-qsub-jobs-are-finished-on-a-serve

score 1 · Answer 2 · 2017-11-16

Old question, but for anyone who likes Python and is trying to manage HPC cluster job submissions, I came up with a module for this here. You can run the module directly as a script (python qsub.py) for a demo.

Usage:

$ git clone https://github.com/NYU-Molecular-Pathology/util.git
$ cd util
$ python
Python 2.7.3 (default, Mar 29 2013, 16:50:34)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import qsub
>>> job = qsub.submit(command = 'echo foo; sleep 60', print_verbose = True)
qsub command is:

qsub -j y -N "python" -o :"/home/util/" -e :"/home/util/" <<E0F set="" -x="" echo="" foo;="" sleep="" 60="" set="" +x="" E0F="">>> qsub.monitor_jobs(jobs = [job], print_verbose = True)
Monitoring jobs for completion. Number of jobs in queue: 1
Number of jobs in queue: 0
No jobs remaining in the job queue
([Job(id = 4112505, name = python, log_dir = None)], [])

You can integrate this with your workflow to run whatever tasks you need easier. Do a thing, submit some jobs, wait for them to complete, do more things.

Designed with Python 2.7 and SGE since thats what our system runs. The only non-standard Python libraries required are the included tools.py and log.py modules, and sh.py (also included)