Hi,
I'm running samtools (version 1.3.1, Ubuntu 17.04 default) to generate a VCF from a reference and some BAM files:
samtools mpileup --ff 0x800 -r my_contig -v -f my_genome.fa *.bam -o my.vcf
But in the VCF file, all lines have a format like:
my_contig 4 . A <*> 0 . DP=1;I16=1,0,0,0,34,1156,0,0,0,0,0,0,0,0,0,0;QS=1,0;MQ0F=1 PL 0,0,0 0,3,4
In short, ALT (alternative allele) is set to be "<>". From the VCF specification, * indicates a deletion, while brackets indicate some sort of ID string. To me, none of these make much sense, and here the depth is 1 - how can there be any variants here? (For actual polymorphic sites, ALT is something like G,<>. As if that helps.)
I'm quite confused by this, and a subsequent 'bcftools' similarly fails:
Symbolic alleles other than <DEL> are currently not supported: <*> at my_contig:4
I can generate the consensus using a program I've written myself, but I would like to leverage whatever magic bcftools uses to QC polymorphisms, and also I think it is better to stick to more mainstream tools - providing they work, that is.
Was there ever a resolution to this problem? I'm also trying to get a consensus out of my vcf (same reason, I want the QC), and running into exactly the same error.
If you want a solution, i.e., to not have to deal with these ridiculous <*> calls, then piping
bcftools mpileup
intobcftools call
should not result in these alleles being included in your final VCF. Check out my code here (Analysis Step 7): https://github.com/kevinblighe/ClinicalGradeDNAseq/blob/master/AnalysisMasterVersion1.shit's generated here: https://github.com/samtools/samtools/blob/master/bam2bcf.c#L741 I think there is no call/no ALT here. Did you bcftools call with '--variants-only' ?
I didn't use --variants-only (it's not an option to 'bcftools consensus', which I used). The output is from samtools and not bcftools, anyway. Thanks for the code pointer, but I can't really understand how this is supposed to work.
What's the VCF version? Check the first line of the VCF file to find its version