I am using BWA to align files. I have four directories: seqtk_1, seqtk_2, seqtk_3, seqtk_4. Within each of those directories I have 10 subdirectories: subsample_1, subsample_2, subsample_3, etc. Within each of those subdirectories I have 20 paired-end reads (so from 10 genomes).
I want to put all the files from the all the directories through a pipe and into an output directory (BWA). The structure of this directory is the same as described above. So I have 800 input files and 400 output files.
I have written a script (below):
echo "[info] creating filenames";
for filename in ./Mock_Run/seqtk_*/subsample_*/*_1.fq.gz;
do file=`echo $filename|sed 's/_1.fq.gz//'`;
filenopath=`basename $file`;
for i in $(seq 10 $END);
do echo subsample_$i;
for u in $(seq 4 $END);
do eval outpath=BWA/seqtk_$u/subsample_$i;
echo "[info] starting BWA alignment...";
bwa mem -v 0 combine_reference.fa.gz ${filenopath}_1.fq.gz ${filenopath}_2.fq.gz > ${outpath}/${filenopath}_BWA.sam;
echo "[info] converting sam file to bam file";
samtools view -bS ${outpath}/${filenopath}_BWA.sam > ${outpath}/${filenopath}_BWA.bam;
echo "[info]filtering unmapped reads....";
samtools view -h -f 4 ${outpath}/${filenopath}_BWA.bam > ${outpath}/${filenopath}_unmapped.bam;
echo "[info] filtering mapped reads...";
samtools view -h -F 4 ${outpath}/${filenopath}_BWA.bam > ${outpath}/${filenopath}_mapped.bam;
echo "[info] sorting files";
samtools sort -o ${outpath}/${filenopath}_mapped_sorted.bam ${outpath}/${filenopath}_mapped.bam ;
samtools sort -o ${outpath}/${filenopath}_unmapped_sorted.bam ${outpath}/${filenopath}_unmapped.bam;
echo "[info] finished...no error to report";
done;
done;
done
It loops through all the files (like I wanted) and puts them into the right output subdirectory (like I wanted). It all seems to work, except it continues to loop. Once it has gone through all the files, it then starts again.
Any help would be appreciated.
have a look at e.g. nextflow it will pay off after some learning as soon as the workflow gets more complex
Thank you, I will definitely look at it
Just a remark: There is (in my opinion) no advantage in storing the SAM files, as they are not binary and by this take a lot of space. As most downstream applications require sorted BAM files anyway, better pipe BWA directly into SAMtools sort:
I tried to do that but couldn't. I'll change my script. Thank you