Filter vcf SNPs by sample GT value
2
0
Entering edit mode
16 months ago

I have a merged VCF file with multiple samples on joined SNP set, where original genotypes have a 0|0 / 0|1 / 1|0 / 1|1 genotype (GT) and merged are fomatted as 0/0 if SNP was missing (--missing-ref option of bcftools merge). I'd like to filter the file to only have those SNPs that don't have 0/0 in GT field of last two columns. Here's an excerpt from a vcf file (header and first two example lines):

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  orig_s1 orig_s2 orig_s3  merged_s1 merged_s2
chr5    11875   .       G       A       .       PASS    VT=SNP;NS=2548;DP=1908;AF=0.05;EAS_AF=0;EUR_AF=0.01;AFR_AF=0.17;AMR_AF=0.02;SAS_AF=0.02;AN=5100;AC=267  GT      1|0     0|0     0|0     0/0     0/0
chr5    11956   .       G       C       .       PASS    VT=SNP;NS=2548;DP=13160;AF=0.03;EAS_AF=0;EUR_AF=0;AFR_AF=0.11;AMR_AF=0.01;SAS_AF=0;AN=5100;AC=148       GT      0|0     0|0     0|0     1|1     0/0
chr5    11989   .       G       A       .       PASS    VT=SNP;NS=2548;DP=14138;AF=0;EAS_AF=0;EUR_AF=0;AFR_AF=0;AMR_AF=0;SAS_AF=0;AN=5100;AC=1                  GT      0|0     0|0     0|0     0/0     1|1
chr5    12002   .       G       C       .       PASS    VT=SNP;NS=2548;DP=14197;AF=0;EAS_AF=0;EUR_AF=0;AFR_AF=0;AMR_AF=0;SAS_AF=0;AN=5100;AC=2                  GT      0|0     0|0     0|0     1|1     1|1

So for this example here I'd like to filter out first entry but not the ones after. I have tried: bcftools view -i 'FORMAT/GT[4] != "0/0" | FORMAT/GT[5] != "0/0"', but this doesn't seem to work. Any help would be appreciated to solve this issue. Thanks!

bcftools variant-calling snp • 759 views
ADD COMMENT
1
Entering edit mode
16 months ago

with GATK selectVariant https://gatk.broadinstitute.org/hc/en-us/articles/360037055952-SelectVariants a JEXL select expression (not tested)

'!vc.getGenotype(4).isHomRef() && !vc.getGenotype(5).isHomRef()'
ADD COMMENT
0
Entering edit mode
16 months ago
Axzd ▴ 80

You are using a pipe wich means "or", but from your wording, you want to exclude SNP with a GT of 0/0 in both the last 2 samples. So, if this is the case, replace "|" with "&" in your command and see if it gives you what you want.

ADD COMMENT

Login before adding your answer.

Traffic: 4823 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6