Entering edit mode
9.4 years ago
pierre.peterlongo
▴
900
Hello all.
A new discoSnp++ release is available. http://colibread.inria.fr/software/discosnp/
Existing features in a few words
- Detect SNPs and indels from raw read set(s), without the need of a reference genome.
- Provides genotyping and ranking
- Generates a fasta file of variant predictions containing a micro-assembly of the variant left and rigth neighbors
- Generates a VCF of predicted variants
- Without a reference: no informative prediction locus
- Providing a reference sequence: map predictions. The VCF contains mapping information (locus, multiple matches, ...)
- Fast, low memory footprint, easy to use.
New features
- New input format (using file of files)
- Easier to deal with paired datasets
- Easier to simulate file concatenation
- The reference can also be used for calling predictions (not only for mapping them)
- The coverage thresholds can be
- set separately for each read set
- and/or automatically detected for each set
Any comment / remark positive or not is still warmly welcome on this forum with the discoSNP tag.
Nice!
Quick question : how robust would you say the automatic coverage threshold detection is? Should the user always rely on it?
Hi Rayan,
We performed some tests to assess the robustness of the automatic threshold detection method, but not in all situations (data type, read-depth and kmer-size), in fact mainly in classical situations.
The automatic threshold detection is pretty robust for whole genome sequencing data with "normal" coverage. If the coverage is too low (say below 10x) or is not homogeneous (for instance in RNA-seq datasets) and the kmer-count profile is monotonically decreasing, the automatic process should be harmless, setting the abundance-min threshold to 3. However, in these cases, it may be not so robust since if the profile is not strictly decreasing it can fail to notice this case. For large coverages, say larger than 1000x, the method may be more sensitive to random fluctuations in the kmer-count profile.
Therefore, for atypical read depths or kmer sizes we recommend to look at the kmer count profile and set the threshold manually.
Note that discoSnp was shown to produce similar results with small changes in this abundance-min parameter (again in classical situations).
Claire
Thanks for the explanation!
Hi,
When I run ./compile_discoSnp++.sh I get this at the end of the compilation (I downloaded http://gatb-tools.gforge.inria.fr/versions/src/DiscoSNP++-2.2.0-Source.tar.gz)
What is the reason for these errors?
Thanks
Hamdi
I should admit I've no clear answer, one cannot see any error message in the log.
Could you provide you compiler version?