for loop over bwa aln
1
0
Entering edit mode
5.9 years ago
phylofun ▴ 50

My for loop around bwa aln and bwa sampe doesn't seem to work. First, I created paths to the read data:

ls -1r  reads/*/cleaned/*READ1.fastq.gz > READ1_list
ls -1r  reads/*/cleaned/*READ2.fastq.gz > READ2_list

Next, I made a for loop using READ1_list and READ2_list around the bwa aln function:

for i in READ1_list 
do
  bwa aln reference/reference.unaligned.fasta $i > ${i%READ1.fastq.gz}read1.sai 
done

for i in READ2_list
do
  bwa aln reference/reference.unaligned.fasta $i > ${i%READ2.fastq.gz}read1.sai 
done

Then, I use a for loop around bwa sampe:

 for i in $(cat READ1_list | sed s'/\-READ1.fastq.gz//' | sort) 
do
  bwa sampe reference/reference.unaligned.fasta $i\-READ1.sai $i\-READ2.sai $i\-READ1.fastq.gz $i\-READ2.fastq.gz \
  | samtools view -bS - > $i\-pe.bam
done

Can anyone spot the issue?

bwa alignment next-gen • 3.1k views
ADD COMMENT
1
Entering edit mode

I can't spot the issue. What happens? Are there error messages?

Unless you reads are shorter than 70bp, you should be using bwa mem instead of bwa aln.

ADD REPLY
0
Entering edit mode

There is an error message when I run bwa sampe, but I think the issue begins with bwa aln. No error message occurs when I run bwa aln; however, the .sai output file goes to the current working directory instead of following the path in READ1_list, and the sample name is not written for the .sai file (e.g. READ1_listread1.sai instead of sample_name-read1.sai)

ADD REPLY
0
Entering edit mode

Put an echo in your loop to see how it looks like:

 for i in $(cat READ1_list | sed s'/\-READ1.fastq.gz//' | sort) 
do
  echo "bwa sampe reference/reference.unaligned.fasta $i\-READ1.sai $i\-READ2.sai $i\-READ1.fastq.gz $i\-READ2.fastq.gz \
  | samtools view -bS - > $i\-pe.bam"
done

Please note that you could also do the following and pipe directly to samtools sort, without samtools view:

bwa sample ref.fa R1 R2 | samtools sort -o alignment.bam
ADD REPLY
0
Entering edit mode

I see two problems in your code:

  1. The second alignment outputs a read1.sai file, it should be read2.sai
  2. The final BAM file requires files named as READ but your alignments are read (file names are generally case-sensitive)

Also it is hard to find the errors without any detail, are you running this on Linux/Mac or Windows?, what are the names of the input files?, what is inside the READ1_list and READ2_list files?

ADD REPLY
0
Entering edit mode
for i in READ1_list

That shouldn't even work. You should instead e.g.

for i in $(cat READ1_list); do ...

Just try e.g.

for i in READ1_list; do echo "$i"; done

vs.

for i in $(cat READ1_list); do echo "$i"; done

Only the latter works in bash (as intended)..

But anyway, I wouldn't even use a for loop for this but instead just define a function and use it with find and GNU parallel. Also, why are you aligning the reads separately? Surely it would make more sense to align read pairs properly in the same runs?

ADD REPLY
0
Entering edit mode
5.9 years ago
Joe 21k

This is overkill, but you might be interested in my bash code for aligning/remapping: (You'll mainly want to look at Line 95 onward, everything before that is just picking up commandline args etc.)

With a little more bash magic, you can loop your files and pass the relevant paths as arguments:

paste -- "READS1.txt" "READS2.txt" |
while IFS=$'\t' read -r reads1 reads2; do
 bash align_and_remap.sh -r /path/to/reference.fasta -1 "$reads1" -2 "$reads2" -o /path/to/outdir
done

*Disclaimer, this script is not well tested at all. Do some testing of you own before letting it loose on all your data. This is also based on an old setup with bwa aln where you had to provide each read file separately. That doesn't need to happen with bwa mem I believe.

ADD COMMENT

Login before adding your answer.

Traffic: 1708 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6