Question

sbacth on multiple fastq files sequentially

1

Entering edit mode

4.4 years ago

Kai_Qi ▴ 130

I have an file named "starmapping.sh" which I used for mapping fastq files using STAR. My use is :

sbatch strarmapping.sh SRR123456

I have been used it to map individual fastq files for a while. Currently, I have 10 fastq, if I wait for one submitted job finished and then submit another one it will need to check the waiting queue status now and then.

I am wondering if there is a way to use the starmapping.sh file for all the fastq files in a sequential manner so that I do't need to do it one after the other? I asked this because our university server is more readily to allocate resources in the midnight. I want to submit the job at 12am and then let it running. So the next day, I will be able to do the downstream anslysis.

Thanks,

the main content of my starmapping format is like this:

#!/bin/bash
#SBATCH --job-name=starmapping_sbatch
#SBATCH --output=starmapping_sbatch.out
#SBATCH --error=starmapping_sbatch.err
#SBATCH --time=02:00:00
#SBATCH --partition=bigmem2
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=10
#SBATCH --mem-per-cpu=20G

#get the input data file
INPUT=${1}

module load STAR
STAR --genomeDir /scratch/midway2/caiqi/GRCh37_star_index_150bp \
--runThreadN 16 \
--readFilesIn ${INPUT}.fastq \
--outFileNamePrefix GRCh37${INPUT} \
--twopassMode Basic \
 ...

RNA-Seq STAR alignment rna-seq • 2.3k views

ADD COMMENT • link 4.4 years ago by Kai_Qi ▴ 130

1

Entering edit mode

You could simply do something like

for i in SRR1 SRR2 SRR3 .. SRR10; do sbatch strarmapping.sh ${i}; done

If your jobs are not going get resources right away they will simply pend. They should start then executing depending on job slots/resources allocated to your account, until they all complete.

If you absolutely want to submit the jobs before midnight then you could put the for loop in a script and schedule that to run using at unix command.

Note: You are asking for 16 threads on your STAR command line but requesting only 10 tasks per node (and 1 node) on SLURM side. You should match those two numbers, either up or down.

ADD REPLY • link 4.4 years ago by GenoMax 148k

0

Entering edit mode

Thanks!

I will put the for loop in "loop_script.sh" file and do bash. I will be back to update how it works

ADD REPLY • link 4.4 years ago by Kai_Qi ▴ 130

0

Entering edit mode

For your comments: "Note: You are asking for 16 threads on your STAR command line but requesting only 10 tasks per node (and 1 node) on SLURM side. You should match those two numbers, either up or down".

I have been using this for a while, will this cause problem for downstream analysis?

Thanks again for reminding.

ADD REPLY • link 4.4 years ago by Kai_Qi ▴ 130

1

Entering edit mode

No it won't cause a problem with data/analysis. Even though STAR asks for 16 cores it is only going to get 10 from SLURM so your jobs may be running a little slower than they could.

ADD REPLY • link 4.4 years ago by GenoMax 148k

1

Entering edit mode

As genomax said, no. The opposite (when one reserves more resources than used), however, is worst: it will lead to under-utilization of the cluster, and, in case your cluster usage is billed, it will lead to higher bills with no return in performance.

ADD REPLY • link 4.4 years ago by h.mon 35k

0

Entering edit mode

Thanks a lot for your comments. I am going to revise it back.

ADD REPLY • link 4.4 years ago by Kai_Qi ▴ 130

0

Entering edit mode

I just checked the running progress. It worked well. one question is that the starmapping_sbatch_error and starmapping_sbatch_out was overwrote. Is there a way to keep it for individual fastq?

ADD REPLY • link 4.4 years ago by Kai_Qi ▴ 130

0

Entering edit mode

Fry the following.

Option 1:

Replace these two lines from your STARmapping script:

#SBATCH --output=starmapping_sbatch.out
#SBATCH --error=starmapping_sbatch.err

With:

#SBATCH --output=$1.out
#SBATCH --error=$1.err

Test it to make sure it works by using echo.

Option 2:

Remove these lines from your script

#SBATCH --output=starmapping_sbatch.out
#SBATCH --error=starmapping_sbatch.err

And then try

for i in SRR1 SRR2 SRR3 .. SRR10; do sbatch -o ${i}.out -e ${i}.err --wrap="strarmapping.sh ${i}"; done

ADD REPLY • link 4.4 years ago by GenoMax 148k

0

Entering edit mode

Many thanks for comprehensive solutions!

ADD REPLY • link 4.4 years ago by Kai_Qi ▴ 130

0

Entering edit mode

Hi genomax:

Today I got a new fastq files and I tested your last advice of either option 1 or option2. For option1 where should I put the echo command? I have run option 2, but I got an error from one of the SRR.err:

$ cat SRR5991068.err 
/tmp/slurmd/job5699485/slurm_script: line 4: starmapping.sh: command not found

After modification as advised in option2, line 4 will be:

#SBATCH --partition=bigmem2

So, what/where could be the problem?

Thank you,

ADD REPLY • link 4.3 years ago by Kai_Qi ▴ 130

1

Entering edit mode

echo command is just to make sure the command lines look correct. You can add echo $1 to script and make sure it is printing the name of the current file. $1 should be interpreted as the name of the file program is currently working on.

I don't understand why you go the error you posted above.

ADD REPLY • link 4.3 years ago by GenoMax 148k