I have a merged VCF file with multiple samples on joined SNP set, where original genotypes have a 0|0 / 0|1 / 1|0 / 1|1 genotype (GT) and merged are fomatted as 0/0 if SNP was missing (--missing-ref
option of bcftools merge
).
I'd like to filter the file to only have those SNPs that don't have 0/0
in GT field of last two columns. Here's an excerpt from a vcf file (header and first two example lines):
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT orig_s1 orig_s2 orig_s3 merged_s1 merged_s2
chr5 11875 . G A . PASS VT=SNP;NS=2548;DP=1908;AF=0.05;EAS_AF=0;EUR_AF=0.01;AFR_AF=0.17;AMR_AF=0.02;SAS_AF=0.02;AN=5100;AC=267 GT 1|0 0|0 0|0 0/0 0/0
chr5 11956 . G C . PASS VT=SNP;NS=2548;DP=13160;AF=0.03;EAS_AF=0;EUR_AF=0;AFR_AF=0.11;AMR_AF=0.01;SAS_AF=0;AN=5100;AC=148 GT 0|0 0|0 0|0 1|1 0/0
chr5 11989 . G A . PASS VT=SNP;NS=2548;DP=14138;AF=0;EAS_AF=0;EUR_AF=0;AFR_AF=0;AMR_AF=0;SAS_AF=0;AN=5100;AC=1 GT 0|0 0|0 0|0 0/0 1|1
chr5 12002 . G C . PASS VT=SNP;NS=2548;DP=14197;AF=0;EAS_AF=0;EUR_AF=0;AFR_AF=0;AMR_AF=0;SAS_AF=0;AN=5100;AC=2 GT 0|0 0|0 0|0 1|1 1|1
So for this example here I'd like to filter out first entry but not the ones after. I have tried:
bcftools view -i 'FORMAT/GT[4] != "0/0" | FORMAT/GT[5] != "0/0"'
, but this doesn't seem to work. Any help would be appreciated to solve this issue. Thanks!