Hello,
I am trying to merge VCF files from several samples from different sequencing runs. I ran bcftools merge
on the VCF files and after ten hours I got the error message "Incorrect number of FORMAT/GP values at chr_Y:216795, cannot merge. The tag is defined as Number=G, but found
2 values and 3 alleles. See also http://samtools.github.io/bcftools/howtos/FAQ.html#incorrect-nfields"
The problem seems to be coming from Y chromosome genotypes of males. I set the Y chromosome to be haploid for male samples when calling genotypes with the --ploidy-file
option of bcftools call
.
This line in particular is what made bcftools merge crash, like the error message suggests, there are three alleles.
Line in VCF from one male:
chr_Y 216795 . A T 136.416 . DP=7;VDB=0.226073;SGB=-0.636426;MQSBZ=-1.32288;FS=0;MQ0F=0;AC=1;AN=1;DP4=0,0,4,3;MQ=42 GT:PL:DP:SP:AD:GP:GQ 1:166,0:7:0:0,7:-nan:127
Line in VCF from another male:
chr_Y 216795 . A G 225.417 . DP=17;VDB=0.77499;SGB=-0.690438;MQSBZ=-0.293759;FS=0;MQ0F=0;AC=1;AN=1;DP4=0,0,9,8;MQ=34 GT:PL:DP:SP:AD:GP:GQ 1:255,0:17:0:0,17:-nan:127
And another male:
chr_Y 216795 . A . 263.589 . DP=10;MQSBZ=-0.847399;FS=0;MQ0F=0;AN=1;DP4=8,2,0,0;MQ=48 GT:DP:SP:AD 0:10:0:10
The males with non-reference alleles have "-nan" in the GP field which I am guessing is what made it crash?
Is there a workaround for this problem? I am thinking of just stripping the GP field entirely from the VCFs with bcftools annotate -x GP
but that seems quite harsh so I wanted to see if there is any other solution. There seems to be many alleles on chr_Y that might have this problem so I don't want to manually exclude those lines.
Thank you!