Mapping paired end reads with ngm and samtools, using prefixes and suffixes for creating vcf eventually
1
0
Entering edit mode
20 months ago
Human • 0

So, I have problems with a script for mapping and with creating sam and bam files to eventually get to a vcf.

My input files look like this: 262 files, paired reads, with the naming convention:

AH-00001_S117_L001_R1_P_trimmed.001.fastq 
AH-00001_S117_L001_R2_P_trimmed.001.fastq
KewS01_S470_L001_R1_P_trimmed.001.fastq 
KewS01_S470_L001_R2_P_trimmed.001.fastq 

My Script looks like this:

#!/bin/bash
(...)
#SBATCH --array=1-262

filenameFWD=$(ls -1 *_L001_R1_P_trimmed.001.fastq | tail -n +"${SLURM_ARRAY_TASK_ID}" | head -1) 
filenameREV=$(ls -1 *_L001_R2_P_trimmed.001.fastq | tail -n +"${SLURM_ARRAY_TASK_ID}" | head -1)

prefixFWD=$(echo "$filenameFWD" | sed 's/_L001_R1_P_trimmed\.001\.fastq$//')
prefixREV=$(echo "$filenameREV" | sed 's/_L001_R2_P_trimmed\.001\.fastq$//') 

module load NextGenMap

ngm -r Reference.fasta -1 "${prefixFWD}"_L001_R1_P_trimmed.001.fastq -2 "${prefixREV}"_L001_R2_P_trimmed.001.fastq -t 12 -o "${prefixFWD}".sam "${prefixREV}".sam

module load SAMtools/1.3.1

samtools view -bS "${prefixFWD}".sam "${prefixREV}".sam | samtools sort -o "${prefixFWD}".bam "${prefixREV}".bam
samtools index "${prefixFWD}".bam "${prefixREV}".bam`

And in the end I get the following Error:

NGM] Opening for output (SAM): .sam
(...)
[INPUT] Input is paired end data.
[INPUT] Opening file _L001_R1_P_trimmed.001.fastq for reading
[INPUT] Opening file _L001_R2_P_trimmed.001.fastq for reading
[INPUT] File _L001_R1_P_trimmed.001.fastq does not exist!
This error is fatal. Quitting...

Why doesn't it find the files ? (all files, the reference and the script are in the same directory) in the end it results only in exactly half of the desired sam files for each naming convention and nothing more.

Anyone knows how to fix this or where my approach fails ?

ngm sam samtools • 1.1k views
ADD COMMENT
0
Entering edit mode

Side note: You don't need the second ls operation for the other FASTQ, simply replace R1 with R2. Also, basename will help avoid sed since the task does not have any regex involved:

filenameFWD=$(ls -1 *_L001_R1_P_trimmed.001.fastq | tail -n +"${SLURM_ARRAY_TASK_ID}" | head -1) 
filenameREV=${filenameFWD/_R1_/_R2_}

prefixFWD=$(basename $filenameFWD _L001_R1_P_trimmed.001.fastq)

Why would prefixREV be any different from prefixFWD? Isn't the only difference R1 vs R2?

Also, do not rely on ls yielding the same results everytime when you're writing to the same directory you're reading from. Instead, do:

ls -1 *R1*fastq | xargs -I v_f basename v_f _L001_R1_P_trimmed.001.fastq > smpl_names.txt

and then in the slurm script,

prefix=$(head -n "${SLURM_ARRAY_TASK_ID}" smpl_names.txt | tail -n 1)
filenameFWD=${prefix}_L001_R1_P_trimmed.001.fastq
filenameREV=${prefix}_L001_R2_P_trimmed.001.fastq
..
..
ADD REPLY
0
Entering edit mode

hey thanks thats smart! But how do you mean this part

ls -1 *R1*fastq | xargs -I v_f basename v_f _L001_R1_P_trimmed.001.fastq > smpl_names.txt

what should be the smpl_names.txt ? the sam and bam files should have the prefix.sam and prefix.bam as name, no ?

So how would the whole code look like in regard to the code I provided?

ADD REPLY
0
Entering edit mode

what should be the smpl_names.txt

That's the file name for the output. Learn more about redirection in linux - Google the term. And yes, once you get the $prefix, you can use it wherever you need the prefix.

ADD REPLY
0
Entering edit mode
20 months ago
Ram 44k

The prefixXXX values seem to be empty - did you dry run the jobs and ensure each iteration picks up the values you want it to? Make the changes I recommended in my comment above to simplify your loop, then add an echo in front of your ngm and samtools lines and submit the batch. You will lose a few minutes of your compute hours but a dry run will ensure things work fine.

Once things check out, remove the echo statements to run the actual tools themselves.

ADD COMMENT
0
Entering edit mode

hey , thank you btw . you helped me a lot ! it worked now thanks to your help. now I am struggling again with the downstream processing to get to a useful vcf, would you like to have a look on my recent post maybe? you seem to be pretty knowledgeable. I hope thats not too intrusive. thank you in advance

ADD REPLY
0
Entering edit mode

How can I refuse when you're ask so nicely, I'll definitely take a look.

ADD REPLY

Login before adding your answer.

Traffic: 2153 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6