My for loop around bwa aln and bwa sampe doesn't seem to work. First, I created paths to the read data:
ls -1r reads/*/cleaned/*READ1.fastq.gz > READ1_list
ls -1r reads/*/cleaned/*READ2.fastq.gz > READ2_list
Next, I made a for loop using READ1_list and READ2_list around the bwa aln function:
for i in READ1_list
do
bwa aln reference/reference.unaligned.fasta $i > ${i%READ1.fastq.gz}read1.sai
done
for i in READ2_list
do
bwa aln reference/reference.unaligned.fasta $i > ${i%READ2.fastq.gz}read1.sai
done
Then, I use a for loop around bwa sampe:
for i in $(cat READ1_list | sed s'/\-READ1.fastq.gz//' | sort)
do
bwa sampe reference/reference.unaligned.fasta $i\-READ1.sai $i\-READ2.sai $i\-READ1.fastq.gz $i\-READ2.fastq.gz \
| samtools view -bS - > $i\-pe.bam
done
There is an error message when I run bwa sampe, but I think the issue begins with bwa aln. No error message occurs when I run bwa aln; however, the .sai output file goes to the current working directory instead of following the path in READ1_list, and the sample name is not written for the .sai file (e.g. READ1_listread1.sai instead of sample_name-read1.sai)
The second alignment outputs a read1.sai file, it should be read2.sai
The final BAM file requires files named as READ but your alignments are read (file names are generally case-sensitive)
Also it is hard to find the errors without any detail, are you running this on Linux/Mac or Windows?, what are the names of the input files?, what is inside the READ1_list and READ2_list files?
But anyway, I wouldn't even use a for loop for this but instead just define a function and use it with find and GNU parallel. Also, why are you aligning the reads separately? Surely it would make more sense to align read pairs properly in the same runs?
This is overkill, but you might be interested in my bash code for aligning/remapping:
(You'll mainly want to look at Line 95 onward, everything before that is just picking up commandline args etc.)
With a little more bash magic, you can loop your files and pass the relevant paths as arguments:
*Disclaimer, this script is not well tested at all. Do some testing of you own before letting it loose on all your data. This is also based on an old setup with bwa aln where you had to provide each read file separately. That doesn't need to happen with bwa mem I believe.
I can't spot the issue. What happens? Are there error messages?
Unless you reads are shorter than 70bp, you should be using
bwa mem
instead ofbwa aln
.There is an error message when I run bwa sampe, but I think the issue begins with bwa aln. No error message occurs when I run bwa aln; however, the .sai output file goes to the current working directory instead of following the path in READ1_list, and the sample name is not written for the .sai file (e.g. READ1_listread1.sai instead of sample_name-read1.sai)
Put an echo in your loop to see how it looks like:
Please note that you could also do the following and pipe directly to samtools sort, without samtools view:
I see two problems in your code:
Also it is hard to find the errors without any detail, are you running this on Linux/Mac or Windows?, what are the names of the input files?, what is inside the READ1_list and READ2_list files?
That shouldn't even work. You should instead e.g.
Just try e.g.
vs.
Only the latter works in bash (as intended)..
But anyway, I wouldn't even use a for loop for this but instead just define a function and use it with find and GNU parallel. Also, why are you aligning the reads separately? Surely it would make more sense to align read pairs properly in the same runs?