I came accross this new short read aligner WHAM ( http://www.cs.wisc.edu/wham/ ). It appears that it increases the alignment speed substantially and supports indels. They compare it against Bowtie in their paper and it appears to be quite fast. They should have compared it against BWA too. Any one using it for their NGS analysis ?
Someone pointed me to WHAM a few days ago. The idea of indexing is not new. As I remember, PerM uses a similar strategy. A problem with this indexing is you have to build an index for every read length and you cannot easily trim reads while mapping. As to the evaluation, the authors asked Bowtie to output all hits. Bowtie is extremely inefficient in this case. If one wants to see all hits, (s)he should use SOAP2 or a hash-table based mapper.
One additional comment is the memory usage. The manuscript does not say how much memory WHAM uses, but it seems to me that the memory footprint is huge. Bowtie actually trades speed for a small memory footprint. Given enough memory, it can be times faster to output one hit and tens to hundreds times faster to output all hits.
looks interesting. While most aligners have you specify the allowed number of mismatches at the match stage, this one does so at the indexing stage. In addition, the read length must be specified at index creation.
I didnt see mention of how/if it uses quality information.
Yes that is true, but today many people doing NGS has > 64 GB RAM servers, and if WHAM can achieve '1500 million 60bps reads per hour' as they advertised with indels then it is pretty impressive even if it means using reads of the same length.
Several big sequencing centers mostly have 2-4GB per CPU, or 8-16GB per computing node. These centers are the major force to popularize a software package.
Ok. I have tried WHAM. It is extremely fast, about 70Gbp per CPU day (I think '1500 million 60bps reads per hour' is sort of typo). On the same test data, bwa has a speed of 7Gbp per CPU day, 10X slower. Wham should have similar speed to bowtie for single-end data.
However, wham only mapped 48% of reads; a lot of perfect unique hits are missed. I was trying to use the default except setting the read length (80bp) and maximally allowed mismatches (4). Wham takes 20GB memory.
So, this is my experience. It is quite possible that I did something silly...
Someone pointed me to WHAM a few days ago. The idea of indexing is not new. As I remember, PerM uses a similar strategy. A problem with this indexing is you have to build an index for every read length and you cannot easily trim reads while mapping. As to the evaluation, the authors asked Bowtie to output all hits. Bowtie is extremely inefficient in this case. If one wants to see all hits, (s)he should use SOAP2 or a hash-table based mapper.
One additional comment is the memory usage. The manuscript does not say how much memory WHAM uses, but it seems to me that the memory footprint is huge. Bowtie actually trades speed for a small memory footprint. Given enough memory, it can be times faster to output one hit and tens to hundreds times faster to output all hits.
looks interesting. While most aligners have you specify the allowed number of mismatches at the match stage, this one does so at the indexing stage. In addition, the read length must be specified at index creation. I didnt see mention of how/if it uses quality information.
Is there a link to a paper? Looks like the index is tied to the read length.
The command line arguments used by WHAM are similar to those used by Bowtie and it does appear to have a flag for quality aware alignment. http://www.cs.wisc.edu/wham/manual Here is the link to the paper: http://users.cis.fiu.edu/~jli003/papers/li286_wham.pdf
Yes that is true, but today many people doing NGS has > 64 GB RAM servers, and if WHAM can achieve '1500 million 60bps reads per hour' as they advertised with indels then it is pretty impressive even if it means using reads of the same length.
From the comparison with bowtie in the manuscript, I believe this is a typo.
Several big sequencing centers mostly have 2-4GB per CPU, or 8-16GB per computing node. These centers are the major force to popularize a software package.