I have used Somaticsniper to call cancer-specific snp in a pair of cancer and normal tissues:
bam-somaticsniper -q 1 -Q 40 -f ucsc.hg19.fasta ERR031023.bam(normal) ERR031024.bam(cancer) ERR031024.snp.vcf
After collection of different snps, I selected 1000 snps with their somatic score greater than 150.
When I used IGV to visualize and verify the 1000 snps, however, I found a lot of problems. For example, on chromosome M, most of snps were machine artifacts, I saw many reads eithor normal or cancer correctly mapped to human genome, there was no snp, but somaticsniper reported snp.
Moreover, for chromosome 1, there was no read mapped to human genome, but somaticsniper also reported snp there. There are too many false snps, making the result unreliable.
I want to know, is this a problem related to my command line? or Did you face similar problem before? Any suggestion will be appreciated. LI Jia
I redid the somaticsniper snp calling again.The reuslts are much better, however, for high read coverage snp, the false positive rate is really high. Any suggestion for that? thank you for your reply.
Great! I'm glad that re-running has helped significantly.
We do not have a great solution for high coverage SNVs. The problems there are fundamental to the algorithm and . I won't go into the details. Your best bet is to filter out high coverage calls and/or utilize a different caller for those high depth regions. VarScan 2 does a nice job for us in general, but I'm certain there are other callers that would work well.