I have 1000 samples that I need to run through a certain pipeline. Are there any programs that can generate shell scripts for each sample?
For example, if I have a command to run BWA on a sample, e.g.
bwa mem -R "@RG\tID:[sample]\tPL:ILLUMINA\tLB:lib1" $REF [sample]_1.fastq [sample]_2.fastq > [sample].sam
How can I generate this script for 1000 samples (replacing "[sample]" which respective sample ID). Are there any tools that do this sort of batch processing? I've heard of people using make, but I'm unsure of what they mean.
I know I can write a loop that processes each file separately, however I'd like to submit the jobs separately as there is a time limit on each job I can submit to my cluster.
Depending on your sample naming scheme, perhaps use a for loop with seq:
#!/bin/bash
for idx in `seq 1 1000`
do
SAMPLE_ID="sample_${idx}"
bwa mem -R "@RG\tID:#${SAMPLE_ID}\tPL:ILLUMINA\tLB:lib1" $REF ${SAMPLE_ID}_1.fastq ${SAMPLE_ID}_2.fastq > ${SAMPLE_ID}.sam
done
If it isn't clear what this does, you might first run this to see how it works:
#!/bin/bash
for idx in `seq 1 1000`
do
SAMPLE_ID="sample_${idx}"
echo ${SAMPLE_ID}
done
If you are submitting jobs to a cluster (say, with qsub) then you just qsub within the for loop:
#!/bin/bash
for idx in `seq 1 1000`
do
SAMPLE_ID="sample_${idx}"
qsub /* ...options... */ bwa mem -R "@RG\tID:#${SAMPLE_ID}\tPL:ILLUMINA\tLB:lib1" $REF ${SAMPLE_ID}_1.fastq ${SAMPLE_ID}_2.fastq > ${SAMPLE_ID}.sam
done
To be a good neighbor to your fellow cluster users, you will want to debug and understand how this script will run, before you submit 1000 jobs to your cluster.
Thanks for your help. I understand how to do this in a loop, but our cluster does not allow us to submit commands through qsub. Instead we may submit shell scripts. I'm trying to generate these files in a quick way. Is there a way to make the loop do this?
Thanks for your help. I understand how to do this in a loop, but our cluster does not allow us to submit commands through qsub. Instead we may submit shell scripts. I'm trying to generate these files in a quick way. Is there a way to make the loop do this?
You might look into generating a template script file, and then using sed to replace a placeholder keyword in the template with your sample ID value.
Thank you, this is exactly what I ended up doing! For anyone interested, here is what I did:
"XXXX" was the placeholder text for the sample name in the template file I created.