Using COUNT with bcftools filter
1
0
Entering edit mode
29 days ago
mykle hoban ▴ 40

I'm trying to track down some issues in a few VCF files I have where there is only one ALT allele listed, but there are more than one AO value and/or more than two AD values. I know bcftools can help me filter these out using the COUNT function, but I'm struggling with the syntax.

Say I'm looking for entries in the VCF where all samples have exactly two AD values.

I can do this for specific samples, like this (which will find entries in which the first sample has two AD values):

bcftools filter -i 'COUNT(AD[0:*]) == 2' input.vcf.gz > filtered.vcf

I can do this for any numbered sample x with COUNT(AD[x:*]) == 2. The problem is I want to filter out entries where all samples have two AD values. I've tried this with multiple permutations of the indexing syntax and I can't get it to work.

Is it possible to do what I want to do with bcftools filter? Or do I have to index each individual sample? like:

bcftools filter -i 'COUNT(AD[0:*]) == 2 & COUNT(AD[1:*]) == 2 & COUNT(AD[2:*]) == 2 & ...'

Thanks!

bcftools vcf filter • 429 views
ADD COMMENT
3
Entering edit mode
29 days ago

using vcffilterjdk:

 java -jar varkit.jar vcffilterjdk -e 'return variant.getGenotypes().stream().allMatch(G->G.hasAD() && G.getAD().length==2);' in.vcf
ADD COMMENT
0
Entering edit mode

This is cool and I appreciate it and will make a note of it and will probably end up using it but also my question was whether this was possible with bcftools filter.

ADD REPLY
0
Entering edit mode

Actually forget it. This does exactly what I want it to, so while I'm still interested in whether it's possible with bcftools filter, I'm gonna mark this as answered.

ADD REPLY

Login before adding your answer.

Traffic: 1840 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6