I'm running bbmap's callvariants.sh this way for a diploid population of several samples:
callvariants.sh -Xmx20g ploidy=2 nopassdot=t multisample=t in=file1.bam,file2.bam,... vcf=results.vcf
Each bam file represents one individual.
Now I have two problems that I'm fixing manually with a script, but I wonder how to fix this and the help doesn't get me far (p(I'm too tired) < 0.05)
1) My header looks like this:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT FORMAT FORMAT FORMAT FORMAT FORMAT FORMAT FORMAT FORMAT FORMAT FORMAT FORMAT FORMAT FORMAT FORMAT FORMAT FORMAT FORMAT FORMAT FORMAT FORMAT FORMAT FORMAT FORMAT FORMAT .....
each FORMAT should be the filename of the bam file, how do I tell callvariants.sh do directly take the filename?
2) I have very few reference alleles, in other words, I have very few 0/0 in my output:
0/0:26:5:0.1923:0.2403:0.9980:20.82:PASS
looks good! Depth is 26, seems to be well supported
./.:31:4:0.1290:0.1599:0.9980:15.93:FAIL
why is this missing even though heterozygotes like '0/1:14:5:0.3571:0.4590:0.9980:34.58:PASS' are PASS? For a 38GB vcf file I have 5,201 SNPs with at least one 0/0 but 3,253,627 SNPs with at least one 1/1, that seems imbalanced. (I realise that a large part of these 3 million are garbage, but this is a B. oleracea chromosome, there should be more than 5000 SNPs)
If I set nopassdot=f then the reference alleles and the failing alleles are both labelled 0/0 if I see this correctly. Currently my script replaces ./. cases like the above with a decent depth > 4 by 0/0, but I wonder whether bbmap has an option which I'm not seeing.
(The format string is GT:DP:AD:AF:RAF:SB:SC:PF)
I'm using the standard cutoffs of bbmap:
minreads=2 Ignore variants seen in fewer reads. minqualitymax=15 Ignore variants with lower max base quality. minedistmax=20 Ignore variants with lower max distance from read ends. minmapqmax=0 Ignore variants with lower max mapq. minidmax=0 Ignore variants with lower max read identity. minpairingrate=0.1 Ignore variants with lower pairing rate. minstrandratio=0.1 Ignore variants with lower plus/minus strand ratio. minquality=12.0 Ignore variants with lower average base quality. minedist=10.0 Ignore variants with lower average distance from ends. minavgmapq=0.0 Ignore variants with lower average mapq. minallelefraction=0.1 Ignore variants with lower allele fraction. This should be adjusted for high ploidies. minid=0 Ignore variants with lower average read identity. minscore=20.0 Ignore variants with lower Phred-scaled score.
I'm using bbmap v37.90 with java v1.8.0_151
edit: If I run the same files with the bcftools mpileup as described at http://www.htslib.org/workflow/ I get a 0/0, 1/1 ratio that is much much closer to 1:1
Thank you for your very detailed answer with commands! I'll give recalibration a try!