I've been using Samtools/BCFtools to identify SNPs in bacterial genomes mapped to a reference. For some uses, that's enough, but for others I want to know the evidence that a variant does NOT exist - and the lack of SNP is not (for example) due to poor data in that region.
I've searched for ways of generating data for all positions, and found (a) that no-one seems to discuss this, and (b) that removing all of the three '-cgv' options when running BCFtools works - and furthermore removing '-e' option gives a slightly different output. But these still aren't ideal - they don't for example call the bases properly, or give a quality score that they are non-variant.
So what I see is eg:
Normal output: (shows a SNP)
strain1234 1834177 . A C 180 . DP=51;VDB=0.0280;AF1=1;AC1=2;DP4=0,0,29,19;MQ=58;FQ=-168 GT:PL:DP:GQ 1/1:213,141,0:48:99
cvg parameters removed: (shows a non-SNP, and the above SNP)
strain1234 1377185 . C X 0 . DP=65;AF1=0 PL:DP 0,196,255:65
strain1234 1834177 . A C,G,X 0 . DP=51;VDB=0.0280;AF1=1 PL:DP 213,141,0,216,128,213,213,141,216,213:48
cvge parameters removed: (shows a non-SNP, and the above SNP)
strain1234 1377185 . C X 0 . DP=65;I16=33,32,0,0,1505,36349,0,0,3659,216179,0,0,1451,34591,0,0 PL:DP 0,196,255:65
strain1234 1834177 . A C,G,X 0 . DP=51;I16=0,0,29,19,0,0,864,17124,0,0,2774,162396,0,0,971,22103;VDB=0.0280 PL:DP 213,141,0,216,128,213,213,141,216,213:48
I wondered if anyone had ideas on this?
That's a nice compromise, simply outputting depth and placement information for the reference allele.
The problem I always had when confronting this was that, when including indels, there are a huge number of possible alleles per position.