Dear all,
I'm please to introduce the discoSnp]1 tool.
discoSnp detects isolated SNPs (from haploids, diploids or polyploids) from any number of read sets (1 to n) and without using a reference genome. It proposes a small assembly of contigs surrounding the polymorphism and the coverage of each allele from each input dataset. The results qualities are comparable to those obtained with assembly+mapping approaches (Soapdenovo+bowtie+GATK) or with the Cortex tool which can also detects SNPs de-novo. Additionally discoSnp proposes a ranking method which enables to better distinguish real SNPs from sequencing errors or polymorphism due to repeats.This ranking is well adapted while looking for homozygous SNPs in diploid organisms.
Based on the Minia data-structure it has a tiny memory footprint (human read sets can be analyzed with no more than 6GB memory) while being faster than other mentioned tools.
Finally we put effort to make it simple. Even using it in command-line fashion, it requires mainly two parameters that are the size of the used k-mers and the number of occurrences threshold under which a k-mer is considered as due to a sequencing error.
Home web page (download, CeCILL license, manual, galaxy install) is here.
Best,
Pierre