I'm trying to track down some issues in a few VCF files I have where there is only one ALT allele listed, but there are more than one AO
value and/or more than two AD
values. I know bcftools
can help me filter these out using the COUNT
function, but I'm struggling with the syntax.
Say I'm looking for entries in the VCF where all samples have exactly two AD
values.
I can do this for specific samples, like this (which will find entries in which the first sample has two AD
values):
bcftools filter -i 'COUNT(AD[0:*]) == 2' input.vcf.gz > filtered.vcf
I can do this for any numbered sample x
with COUNT(AD[x:*]) == 2
. The problem is I want to filter out entries where all samples have two AD
values. I've tried this with multiple permutations of the indexing syntax and I can't get it to work.
Is it possible to do what I want to do with bcftools filter
? Or do I have to index each individual sample? like:
bcftools filter -i 'COUNT(AD[0:*]) == 2 & COUNT(AD[1:*]) == 2 & COUNT(AD[2:*]) == 2 & ...'
Thanks!
This is cool and I appreciate it and will make a note of it and will probably end up using it but also my question was whether this was possible with
bcftools filter
.Actually forget it. This does exactly what I want it to, so while I'm still interested in whether it's possible with
bcftools filter
, I'm gonna mark this as answered.