So, I have problems with a script for mapping and with creating sam and bam files to eventually get to a vcf.
My input files look like this: 262 files, paired reads, with the naming convention:
AH-00001_S117_L001_R1_P_trimmed.001.fastq
AH-00001_S117_L001_R2_P_trimmed.001.fastq
KewS01_S470_L001_R1_P_trimmed.001.fastq
KewS01_S470_L001_R2_P_trimmed.001.fastq
My Script looks like this:
#!/bin/bash
(...)
#SBATCH --array=1-262
filenameFWD=$(ls -1 *_L001_R1_P_trimmed.001.fastq | tail -n +"${SLURM_ARRAY_TASK_ID}" | head -1)
filenameREV=$(ls -1 *_L001_R2_P_trimmed.001.fastq | tail -n +"${SLURM_ARRAY_TASK_ID}" | head -1)
prefixFWD=$(echo "$filenameFWD" | sed 's/_L001_R1_P_trimmed\.001\.fastq$//')
prefixREV=$(echo "$filenameREV" | sed 's/_L001_R2_P_trimmed\.001\.fastq$//')
module load NextGenMap
ngm -r Reference.fasta -1 "${prefixFWD}"_L001_R1_P_trimmed.001.fastq -2 "${prefixREV}"_L001_R2_P_trimmed.001.fastq -t 12 -o "${prefixFWD}".sam "${prefixREV}".sam
module load SAMtools/1.3.1
samtools view -bS "${prefixFWD}".sam "${prefixREV}".sam | samtools sort -o "${prefixFWD}".bam "${prefixREV}".bam
samtools index "${prefixFWD}".bam "${prefixREV}".bam`
And in the end I get the following Error:
NGM] Opening for output (SAM): .sam
(...)
[INPUT] Input is paired end data.
[INPUT] Opening file _L001_R1_P_trimmed.001.fastq for reading
[INPUT] Opening file _L001_R2_P_trimmed.001.fastq for reading
[INPUT] File _L001_R1_P_trimmed.001.fastq does not exist!
This error is fatal. Quitting...
Why doesn't it find the files ? (all files, the reference and the script are in the same directory) in the end it results only in exactly half of the desired sam files for each naming convention and nothing more.
Anyone knows how to fix this or where my approach fails ?
Side note: You don't need the second
ls
operation for the other FASTQ, simply replaceR1
withR2
. Also,basename
will help avoidsed
since the task does not have any regex involved:Why would
prefixREV
be any different fromprefixFWD
? Isn't the only differenceR1
vsR2
?Also, do not rely on
ls
yielding the same results everytime when you're writing to the same directory you're reading from. Instead, do:and then in the slurm script,
hey thanks thats smart! But how do you mean this part
ls -1 *R1*fastq | xargs -I v_f basename v_f _L001_R1_P_trimmed.001.fastq > smpl_names.txt
what should be the smpl_names.txt ? the sam and bam files should have the prefix.sam and prefix.bam as name, no ?
So how would the whole code look like in regard to the code I provided?
That's the file name for the output. Learn more about redirection in linux - Google the term. And yes, once you get the
$prefix
, you can use it wherever you need the prefix.