Hi all,
I'm confused by a samtools behavior. Using the following code:
$ samtools mpileup -u -r X -f my.fasta my.bam | bcftools view -bvcg - > var.raw.bcf
$ bcftools view var.raw.bcf | vcfutils.pl varFilter -D500 > my-snps.vcf
I get a list of SNPs and InDels. But, I end up with a few cases of samtools calling heterozygous SNPs when there may be 50 reads supporting one SNP (the ref SNP) and one read giving a different SNP. This strikes me as an obvious case in which samtools should ignore the one read.
I can't seem to figure out an easy way around this. Am I missing something simple? Any suggestions?
So, just to re-iterate: at location X I may have 50 reads that are C, and one read which gives T. My .vcf file reports C,T SNP here. I'd just like it to report C.
Thanks in advance.
Are you sure it is calling at heterozygous SNP? Take a look at the
GT
field in your file. It should look something like0/0
0/1
or1/1
corresponding to homozygous reference, heterozygous, and homozygous alternate. I believe that you can have an alternate allele observed but not actually a heterozygous call. Posting the actual data from your file would help.Matt Shirley is right, probably not a het call.