Does anyone know of a program that can call SNPs with low coverage? I have been using samtools mpileup and bcftools/vcftools to find high confident SNPs. I am now trying to identify unknown samples based off of the reference SNP panel I generated. Therefore, I don't need high confidence in calling SNPs in these unknown samples since I know the SNP exists and since I will compare hundreds of these SNPs unknown sample calls. I am dealing with coverage in matters of about 3-5x on average. I really appreciate any suggestions.
As long as you didn't do any prefiltering on your VCF file, you should have all the SNPs in there, from the highly to the lowly covered. In the DP sub-field of the INFO field you'll see it, and you can plot the distribution of it to have a better understanding of what your pipeline is calling (e.g. if low coverage SNPs are inside).
Maybe it's a bit late, but I'd like to highlight the discoSnp approach which might answer this initial question.
Without reference genome, discoSnp may predict SNPs and Indels from raw NGS reads. It does not depend on read alignment process and may find low covered variants. It removes all data seen less than c time. Thus just call discoSnp with -c 2, should answer the requirements (even if it'll miss variants seen only once).
Note that, during a final step, de novo predicted variants can be mapped on a genome, thus providing a VCF file that can be used for downstream analyses.
The "clearfilters" flag clears ALL filters and will thus report all variants seen in the reads, regardless of depth or quality. Alternatively, you could use the flags "minreads=1 minscore=15" which would simply reduce the minimum number of reads and score a bit, or set the filters manually after reading the documentation. But probably for very-low coverage samples like you're using, since you have a set of known variants you're interested in, "clearfilters" is probably the best choice. BBMap also has another tool, used like this:
As long as you didn't do any prefiltering on your VCF file, you should have all the SNPs in there, from the highly to the lowly covered. In the DP sub-field of the INFO field you'll see it, and you can plot the distribution of it to have a better understanding of what your pipeline is calling (e.g. if low coverage SNPs are inside).
hello all.I want to use BBMap'callvariants to call variants.But where can I get the software(latest version)?could you give me the link?
You can download BBMap suite here.
Thank you very much!
Could you give us a little bit more background? Do you have several samples at 3-5X? Are they from the same population? Coding regions?