Hi everyone,
I faced a issue using bwa for some very shorts reads I have (49bp). I have numerous paired Illumina reads that I want to align against an assembly I made.
I saw this previous post about paired-reads aligning with several files : BWA mem on multiple samples , which propose to use bwa mem in a parallel way.
In bwa readme, it is written that, for reads shorter than 70bp, we should proceed this way :
bwa aln ref.fa read1.fq > read1.sai; bwa aln ref.fa read2.fq > read2.sai
bwa sampe ref.fa read1.sai read2.sai read1.fq read2.fq > aln-pe.sam
When I tried this, the bwa aln part worked fine, but the bwa sampe step never ended. Considering that each file contain approximatively 8-10 Giga of data, I have no idea why it take so long (after 2days, I stopped the process).
What do you think about this ? Should I use an other aligner ?
Thanks for your advices,
Roxane
I recommend
bbmap.sh
from BBMap suite. Fast, easy to use, multi-threaded, pure java so will run pretty much anywhere. As long as you havesamtools
in your path you can directly create BAM files during alignments.Okay, thanks for your fast reply. Do you think it will be suited for very short reads like mines ?
Yes. (Had to add this to reach min char limit).
I had a situation where
bwa sampe
was abnormally slow; actually the problem was that the reverse reads (read2) had a sequencing defect: they were mostly homopolymer stretches that aligned wrongly at multiple places on the genome, andbwa sampe
was spending a lot of time to evaluate a large number of equally bad possibilities...Oh this is interesting, so how did you detected this problem ? IS FastQC enough to detect such an issue ? How did you solved it ?
I inspected the file contents directly with the
zless
command, but FastQC would have showed the problem for sure. After double-checking that we did not make any obvious error, we contacted Illumina's technical support, and indeed this time the problem was on the sequencer's side, so they sent us a free kit, which worked perfectly.