Entering edit mode
11.7 years ago
lh3
33k
BWA-MEM is the successor of BWA and BWA-SW. It has the following features:
- Support of query sequences ranged from ~70bp to a few megabases. Except BWA-SW and Last, most read mappers would not work with >5kb query.
- Fast. Similar to bowtie2 and twice as fast as BWA/BWA-SW/Cushaw2 for 100bp reads. Twice as fast as BWA-SW and several times faster than Bowtie2 and Cushaw2 for >500bp reads (even faster for 1kb reads).
- Accurate. For 100bp simulated data, similar to Cushaw2 on accuracy and more accurate then Bowtie2 and BWA. Novoalign is still the most accurate.
- Working with genomes with total length longer than 4GB. Except BWA since 0.6.x, other free BWT-based mappers have the 4GB limit.
- More permissive (than BWA and GEM) to long gaps up to tens of bp for 100bp reads, or up to several hundred bp (tunable) for contig alignment.
- Reporting chimeric alignment where different parts of the query mapped to different places. Note that multiple hits are overlapping alignments, but chimeric alignments are ideally non-overlapping. An aligner reporting multiple hits may not work well with chimeric alignments, in some cases.
- Simpler command line interface and better multi-threading support (than BWA).
- Automatically switching between the end-to-end and local alignment modes. End-to-end alignment reduces false negatives for variants towards the end of a read, but may add false positives for long indels towards the end; local alignment is the contrary. BWA-MEM attempts to choose the right mode for each read, instead of using one mode for all reads.
- Better paired-end mapping (than BWA, BWA-SW and bowtie2). BWA-MEM uses a similar strategy to stampy and novoalign which jointly considers single-end alignment scores, insert size distribution and the possibility of chimeric pairs.
- Exposing basic APIs for single-end alignment. (Bindings in other languages are welcomed.)
BWA-MEM however lacks the following features:
- Working with very short reads. The performance and the accuracy of BWA-MEM will degrade.
- Guaranteed sensitivity to hits within a certain edit distance threshold (as with BWA and GEM).
BWA-MEM is a component of BWA. The repository is hosted by github. The released packages are provided via SourceForge. The preprint of the manuscript and a poster (PDF) are also publicly available.
Have you tested BWA-MEM on PacBio reads? They are pretty long for sequencing reads, but have an error-rate between 11-17%...
How do I tune BWA to allow longer gaps for contig alignment? The default gap extension penalty is a int and set by default to 1. I don't see which other parameter to adjust for allowing longer gaps.