Entering edit mode
2.6 years ago
rodolfo.peacewalker
▴
390
Hi!
I'm trying to align some paired-end fastq files derived from target sequencing using BWA-MEM2 and samtools to filter and sort mapped reads. To avoid the accumulation of bam files after every step, I figured out that a for loop using pipes would be useful. I share my code at the moment:
##Allign the reads with respect to the reference
for file in ../data/*.fastq.gz; do
SAMPLE=$(echo ${file} | sed -e 's/R1.fastq.gz//' | sed -e 's/R2.fastq.gz//')
R1="${SAMPLE}R1.fastq.gz"
R2="${SAMPLE}R2.fastq.gz"
SAMP=$(echo ${SAMPLE} | sed 's/.*data\///' | sed 's/\_clean_paired*//')
echo "Processing sample $SAMP"
bwa-mem2 mem -t 8 hsa_GRCh38 $R1 $R2 | samtools view -Sq >= 1 - | samtools sort - -o ../output/$SAMP.bam
done
I have checked with echo
that the name of the R1, R2 and output files (SAMP) are correctly generated.
However, after running the code I have the next error message from samtools:
Processing sample Q-0002-18_S10_
samtools sort: failed to read header from "-"
I have checked the code for a single sample and works; however, I wonder what is going on when implement the for loop.
Thanks in advance!
Your for loop should look something like this. Note the below options are untested.
You can do all of this directly using find too.
GNU parallel option since it's convenient.
I see a few problems, not sure how your single sample run worked.
>
is shell redirection. How does yoursamtools view
command work at all?-S
is ignored and-q
takes an INT,>=1
is not a valid parameter to anything and should break your command.for f in ../data/*R1.fastq.gz
instead of a more generic glob, and use${f/R1.fastq.gz/}
instead of your double sed.Addendum to point 1:
You're running bwa-mem twice per sample because your loop runs per fastq file, not per fastq-pair. That's where your loop logic breaks.