Entering edit mode
11 months ago
bestone
▴
30
Hello guys,
I wanna run multiple jobs with slurm batch command but I couldn't figure it out. I have a command but it doesn't work. I added it below. some of my data is coming from illumiuna so they are fastq1, fastq2 but some of the files do not have fastq1 and 2 but also four different fastq files because these files are coming from Pacbio. how can running all of them only with one command? Could you pls help me with this issue?
#!/bin/bash
#SBATCH --time=24:00:00
#SBATCH --partition=barbun
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=u....@gmail.com
#SBATCH --ntasks-per-node=16
BWA=~/tmm/bwa.kit/bwa
SAMTOOLS=~/tmm/bwa.kit/samtools
PICARD=/truba/home/ue/Bioinformatic_workflow/programs/picard.jar
GATK=~/gatk4-ulak/gatk-4.2.6.1/gatk
REFSEQ=/truba/home/ue/Bioinformatic_workflow/ref_seq/prunus_armeniaca_gca.903112645.fasta
FASTQ_1=/truba/home/ue/whole_genome/B11/B11_1.fq.gz
FASTQ_2=/truba/home/ue/whole_genome/B11/B11_2.fq.gz
OUTPUT_DIR=/truba/home/ue/Bioinformatic_workflow/b11_workflow/output_files_for
REFSEQ=$1
FASTQ_1=$2
FASTQ_2=$3
SAMPLE_NAME=$4
OUTPUT_DIR=$5
if [ $# -lt 5 ]; then
echo "Usage: $0 REFSEQ FASTQ_1 FASTQ_2 SAMPLE_NAME OUTPUT_DIR"
exit 1
fi
$BWA mem -t $SLURM_NTASKS_PER_NODE -R "@RG\tID:$SAMPLE_NAME\tSM:$SAMPLE_NAME\tPL:ILLUMINA" $REFSEQ $FASTQ_1 $FASTQ_2 > $OUTPUT_DIR/${SAMPLE_NAME}_output.sam
The script is using BWA mem to align your paired-end files, which is fine, but for PacBio it's better to use another aligner such as Minimap2, you can align each Fastq separately and then merge the BAMs, or merge the Fastq per sample and run the aligner.
Thank you so much for your reply JC. But what I want to do is to do all of these analyses with a single command. For example, after all, fastqs are analyzed with BWA, they are analyzed with samtools and then with a single command with Gatk. After analyzing with Bwa, I do not want to write commands for all of them separately.
As JC explained this isn't really recommended and you will have to treat paired-end and non-paired end files differently. If you try to mix these processes in one job (with some sort of automatic detection based on file name, which it is definitely feasible) you are more prone to create a mess. Also, this defies the proper use of a cluster. Instead, I recommend to keep these "low complexity" scripts that do a single task properly.