I have some INFO fields that look like this:
##INFO=<ID=AF,Number=1,Type=Float,Description="Estimated Alternate Allele Frequency">
##INFO=<ID=MAF,Number=1,Type=Float,Description="Estimated Minor Allele Frequency">
##INFO=<ID=R2,Number=1,Type=Float,Description="Estimated Imputation Accuracy (R-square)">
##INFO=<ID=ER2,Number=1,Type=Float,Description="Empirical (Leave-One-Out) R-square (available only for genotyped variants)">
##INFO=<ID=IMPUTED,Number=0,Type=Flag,Description="Marker was imputed but NOT genotyped">
##INFO=<ID=TYPED,Number=0,Type=Flag,Description="Marker was genotyped AND imputed">
##INFO=<ID=TYPED_ONLY,Number=0,Type=Flag,Description="Marker was genotyped but NOT imputed">
I understand that if I wanted to filter based on an INFO col key-value pair I could go:
bcftools view -e 'R2<0.9' my_fav.vcf.gz
The IMPUTED
, TYPED
, TYPED_ONLY
keys appear in the info field with no corresponding value, for example:
AF=0.00016;MAF=0.03036;R2=0.13409;IMPUTED
is a complete info field.
Is there a way one could filter TYPED vs IMPUTED variants by using bcftools utility. I know I could probably grep my way through this if need be.
Interesting question! Can you try if an aggregate function (such as
COUNT(IMPUTED)>0
) picks up these entries?it seems :(. . This valueless key is a standard output of Minimac imputation software, I wish it was a value of some key though
Have you tried:
bcftools view -e 'IMPUTED' my_fav.vcf.gz
or
bcftools view -i 'IMPUTED' my_fav.vcf.gz
This was the first thing I tried before posting. when it did not work I was basically stuck because I couldn't see how to do it outside of bcftools. I just like using bcftools because it helps me catch corrupted VCFs sometimes