Hello,
I'm trying to filter rows from my vcf file based on the condition. I'd like to have only variants where alternate allele is present in at least in 25% of the reads (alternate allele frequency). Usually I'm using SnpSift filter but I'm unable to get to the FORMAT fileds of vcf file by this tool, hence I choose bcftool and unfortunaately I'm not familiar with it... What I did? Based on vignette I've created index and zip the file:
bgzip -c input.vcf > input.vcf.gz
bcftools index input.vcf.gz
Then I tried to filter:
bcftools filter -i 'FORMAT/AD[0:1]*100/(FORMAT/AD[0:0]+FORMAT/AD[0:1]) >= 25' input.vcf.gz > output.vcf
I understand this as follows
FORMAT/AD[0:1] - number of ALT alleles
FORMAT/AD[0:0] - number of REF alleles
Of course I could use the DP
instead of FORMAT/AD[0:0]+FORMAT/AD[0:1]
however variants have been called using GATK and AD
is already filtered in contrast to DP
, what is more DP
contains all read, I mean all possible alleles and AD
only ref's and alt's.
The problem is... When I run this command many of variants are actually filtered, however, when I'm manualy checking the frequency, many of the variants have this procentage value below 25. I don't know why, I think that is some stupid mistake that I can't catch. What do you think about this command?