I have an file named "starmapping.sh" which I used for mapping fastq files using STAR. My use is :
sbatch strarmapping.sh SRR123456
I have been used it to map individual fastq files for a while. Currently, I have 10 fastq, if I wait for one submitted job finished and then submit another one it will need to check the waiting queue status now and then.
I am wondering if there is a way to use the starmapping.sh file for all the fastq files in a sequential manner so that I do't need to do it one after the other? I asked this because our university server is more readily to allocate resources in the midnight. I want to submit the job at 12am and then let it running. So the next day, I will be able to do the downstream anslysis.
Thanks,
the main content of my starmapping format is like this:
#!/bin/bash
#SBATCH --job-name=starmapping_sbatch
#SBATCH --output=starmapping_sbatch.out
#SBATCH --error=starmapping_sbatch.err
#SBATCH --time=02:00:00
#SBATCH --partition=bigmem2
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=10
#SBATCH --mem-per-cpu=20G
#get the input data file
INPUT=${1}
module load STAR
STAR --genomeDir /scratch/midway2/caiqi/GRCh37_star_index_150bp \
--runThreadN 16 \
--readFilesIn ${INPUT}.fastq \
--outFileNamePrefix GRCh37${INPUT} \
--twopassMode Basic \
...
You could simply do something like
If your jobs are not going get resources right away they will simply pend. They should start then executing depending on job slots/resources allocated to your account, until they all complete.
If you absolutely want to submit the jobs before midnight then you could put the
for
loop in a script and schedule that to run usingat
unix command.Note: You are asking for 16 threads on your
STAR
command line but requesting only 10 tasks per node (and 1 node) onSLURM
side. You should match those two numbers, either up or down.Thanks!
I will put the for loop in "loop_script.sh" file and do bash. I will be back to update how it works
For your comments: "Note: You are asking for 16 threads on your STAR command line but requesting only 10 tasks per node (and 1 node) on SLURM side. You should match those two numbers, either up or down".
I have been using this for a while, will this cause problem for downstream analysis?
Thanks again for reminding.
No it won't cause a problem with data/analysis. Even though STAR asks for 16 cores it is only going to get 10 from SLURM so your jobs may be running a little slower than they could.
As genomax said, no. The opposite (when one reserves more resources than used), however, is worst: it will lead to under-utilization of the cluster, and, in case your cluster usage is billed, it will lead to higher bills with no return in performance.
Thanks a lot for your comments. I am going to revise it back.
I just checked the running progress. It worked well. one question is that the starmapping_sbatch_error and starmapping_sbatch_out was overwrote. Is there a way to keep it for individual fastq?
Fry the following.
Option 1:
Replace these two lines from your STARmapping script:
With:
Test it to make sure it works by using
echo
.Option 2:
Remove these lines from your script
And then try
Many thanks for comprehensive solutions!
Hi genomax:
Today I got a new fastq files and I tested your last advice of either option 1 or option2. For option1 where should I put the echo command? I have run option 2, but I got an error from one of the SRR.err:
After modification as advised in option2, line 4 will be:
So, what/where could be the problem?
Thank you,
echo
command is just to make sure the command lines look correct. You can addecho $1
to script and make sure it is printing the name of the current file.$1
should be interpreted as the name of the file program is currently working on.I don't understand why you go the error you posted above.