sbacth on multiple fastq files sequentially
0
1
Entering edit mode
4.4 years ago
Kai_Qi ▴ 130

I have an file named "starmapping.sh" which I used for mapping fastq files using STAR. My use is :

sbatch strarmapping.sh SRR123456

I have been used it to map individual fastq files for a while. Currently, I have 10 fastq, if I wait for one submitted job finished and then submit another one it will need to check the waiting queue status now and then.

I am wondering if there is a way to use the starmapping.sh file for all the fastq files in a sequential manner so that I do't need to do it one after the other? I asked this because our university server is more readily to allocate resources in the midnight. I want to submit the job at 12am and then let it running. So the next day, I will be able to do the downstream anslysis.

Thanks,

the main content of my starmapping format is like this:

#!/bin/bash
#SBATCH --job-name=starmapping_sbatch
#SBATCH --output=starmapping_sbatch.out
#SBATCH --error=starmapping_sbatch.err
#SBATCH --time=02:00:00
#SBATCH --partition=bigmem2
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=10
#SBATCH --mem-per-cpu=20G

#get the input data file
INPUT=${1}

module load STAR
STAR --genomeDir /scratch/midway2/caiqi/GRCh37_star_index_150bp \
--runThreadN 16 \
--readFilesIn ${INPUT}.fastq \
--outFileNamePrefix GRCh37${INPUT} \
--twopassMode Basic \
 ...
RNA-Seq STAR alignment rna-seq • 2.3k views
ADD COMMENT
1
Entering edit mode

You could simply do something like

for i in SRR1 SRR2 SRR3 .. SRR10; do sbatch strarmapping.sh ${i}; done

If your jobs are not going get resources right away they will simply pend. They should start then executing depending on job slots/resources allocated to your account, until they all complete.

If you absolutely want to submit the jobs before midnight then you could put the for loop in a script and schedule that to run using at unix command.

Note: You are asking for 16 threads on your STAR command line but requesting only 10 tasks per node (and 1 node) on SLURM side. You should match those two numbers, either up or down.

ADD REPLY
0
Entering edit mode

Thanks!

I will put the for loop in "loop_script.sh" file and do bash. I will be back to update how it works

ADD REPLY
0
Entering edit mode

For your comments: "Note: You are asking for 16 threads on your STAR command line but requesting only 10 tasks per node (and 1 node) on SLURM side. You should match those two numbers, either up or down".

I have been using this for a while, will this cause problem for downstream analysis?

Thanks again for reminding.

ADD REPLY
1
Entering edit mode

No it won't cause a problem with data/analysis. Even though STAR asks for 16 cores it is only going to get 10 from SLURM so your jobs may be running a little slower than they could.

ADD REPLY
1
Entering edit mode

As genomax said, no. The opposite (when one reserves more resources than used), however, is worst: it will lead to under-utilization of the cluster, and, in case your cluster usage is billed, it will lead to higher bills with no return in performance.

ADD REPLY
0
Entering edit mode

Thanks a lot for your comments. I am going to revise it back.

ADD REPLY
0
Entering edit mode

I just checked the running progress. It worked well. one question is that the starmapping_sbatch_error and starmapping_sbatch_out was overwrote. Is there a way to keep it for individual fastq?

ADD REPLY
0
Entering edit mode

Fry the following.

Option 1:

Replace these two lines from your STARmapping script:

#SBATCH --output=starmapping_sbatch.out
#SBATCH --error=starmapping_sbatch.err

With:

#SBATCH --output=$1.out
#SBATCH --error=$1.err

Test it to make sure it works by using echo.

Option 2:

Remove these lines from your script

#SBATCH --output=starmapping_sbatch.out
#SBATCH --error=starmapping_sbatch.err

And then try

for i in SRR1 SRR2 SRR3 .. SRR10; do sbatch -o ${i}.out -e ${i}.err --wrap="strarmapping.sh ${i}"; done
ADD REPLY
0
Entering edit mode

Many thanks for comprehensive solutions!

ADD REPLY
0
Entering edit mode

Hi genomax:

Today I got a new fastq files and I tested your last advice of either option 1 or option2. For option1 where should I put the echo command? I have run option 2, but I got an error from one of the SRR.err:

$ cat SRR5991068.err 
/tmp/slurmd/job5699485/slurm_script: line 4: starmapping.sh: command not found

After modification as advised in option2, line 4 will be:

#SBATCH --partition=bigmem2

So, what/where could be the problem?

Thank you,

ADD REPLY
1
Entering edit mode

echo command is just to make sure the command lines look correct. You can add echo $1 to script and make sure it is printing the name of the current file. $1 should be interpreted as the name of the file program is currently working on.

I don't understand why you go the error you posted above.

ADD REPLY

Login before adding your answer.

Traffic: 2932 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6