Good morning guys,
I am working on a genomics project, and I want to use the variant allelic frequency (VAF) from population to filter out germline variants from my VCF files. I tried with:
bcftools annotate --annotations $gnomad_variants --columns AF --output $output --output-type v $input_gz
using the exome file (85Go) from gnomad website, because I work with WES data, but I only get 3000 VAF over 300k variants in my vcf.
Do you have any clue of how to get more VAF, and in a more efficient way, to annotate my VCF files better? I insist on the fact that I don't want to get my MAF but VAF from population.
Here are some variants that have AF= and some that don't :
chr1 201123772 . C T 4.37899 . DP=2;SGB=-0.379885;RPBZ=0;MQBZ=0;BQBZ=0;NMBZ=0;SCBZ=0;FS=0;MQ0F=0;AC=1;AN=2;DP4=1,0,1,0;MQ=60;ANNOVAR_DATE=2020-06-08;Func.refGene=intergenic;Gene.refGene=ASCL5\x3bTMEM9;GeneDetail.refGene=dist\x3d8400\x3bdist\x3d10999;ExonicFunc.refGene=.;AAChange.refGene=.;ALLELE_END GT:PL:DP:AD:GP:GQ 0/1:35,0,35:2:1,1:0.000158064,0.999684,0.000158064:35
chr1 201135543 . G C 35.9326 . DP=9;VDB=0.0221621;SGB=-0.511536;RPBZ=0.801784;MQBZ=0;BQBZ=0.46291;NMBZ=-0.707107;SCBZ=-0.707107;FS=0;MQ0F=0;AC=1;AN=2;DP4=6,0,3,0;MQ=60;ANNOVAR_DATE=2020-06-08;Func.refGene=UTR3;Gene.refGene=TMEM9;GeneDetail.refGene=NM_001288564:c.120C>G\x3bNM_001288566:c.120C>G\x3bNM_001288568:c.120C>G\x3bNM_001288567:c.120C>G\x3bNM_001288569:c.120C>G\x3bNM_001288571:c.120C>G\x3bNM_001288570:c.120C>G\x3bNM_016456:c.120C>G\x3bNM_001288565:c.*120C>G;ExonicFunc.refGene=.;AAChange.refGene=.;ALLELE_END GT:PL:DP:AD:GP:GQ 0/1:69,0,123:9:6,3:1.2249e-07,1,1.28777e-13:69
chr1 201145820 . G T 72.4148 . DP=3;VDB=0.0221621;SGB=-0.511536;FS=0;MQ0F=0;AC=2;AN=2;DP4=0,0,3,0;MQ=60;ANNOVAR_DATE=2020-06-08;Func.refGene=intronic;Gene.refGene=TMEM9;GeneDetail.refGene=.;ExonicFunc.refGene=.;AAChange.refGene=.;ALLELE_END GT:PL:DP:AD:GP:GQ 1/1:102,9,0:3:0,3:0,0,1:127
chr1 201157334 . G T 41.4148 . DP=2;VDB=0.02;SGB=-0.453602;FS=0;MQ0F=0;AC=2;AN=2;DP4=0,0,0,2;MQ=60;ANNOVAR_DATE=2020-06-08;Func.refGene=intronic;Gene.refGene=TMEM9;GeneDetail.refGene=.;ExonicFunc.refGene=.;AAChange.refGene=.;ALLELE_END GT:PL:DP:AD:GP:GQ 1/1:71,6,0:2:0,2:0,0,1:127
chr1 201174887 . C A 4.37899 . DP=2;SGB=-0.379885;RPBZ=0;MQBZ=0;BQBZ=0;NMBZ=0;SCBZ=0;FS=0;MQ0F=0;AC=1;AN=2;DP4=1,0,1,0;MQ=60;ANNOVAR_DATE=2020-06-08;Func.refGene=intergenic;Gene.refGene=TMEM9\x3bIGFN1;GeneDetail.refGene=dist\x3d3305\x3bdist\x3d15937;ExonicFunc.refGene=.;AAChange.refGene=.;ALLELE_END GT:PL:DP:AD:GP:GQ 0/1:35,0,35:2:1,1:0.000158064,0.999684,0.000158064:35
chr1 201177778 . C T 21.2837 . DP=3;SGB=-0.379885;RPBZ=1;MQBZ=0;BQBZ=1;NMBZ=-1;SCBZ=0;FS=0;MQ0F=0;AC=1;AN=2;DP4=1,0,1,0;MQ=60;ANNOVAR_DATE=2020-06-08;Func.refGene=intergenic;Gene.refGene=TMEM9\x3bIGFN1;GeneDetail.refGene=dist\x3d6196\x3bdist\x3d13046;ExonicFunc.refGene=.;AAChange.refGene=.;ALLELE_END GT:PL:DP:AD:GP:GQ 0/1:54,0,35:2:1,1:1.35988e-06,0.999767,0.000231332:36
chr1 201199934 . T C 155.487 . DP=40;VDB=0.230965;SGB=-0.670168;RPBZ=-2.67353;MQBZ=0;MQSBZ=0;BQBZ=0.901457;NMBZ=0.363879;SCBZ=-0.856499;FS=0;MQ0F=0;AC=1;AN=2;DP4=8,20,5,5;MQ=60;ANNOVAR_DATE=2020-06-08;Func.refGene=intronic;Gene.refGene=IGFN1;GeneDetail.refGene=.;ExonicFunc.refGene=.;AAChange.refGene=.;ALLELE_END GT:PL:DP:AD:GP:GQ 0/1:189,0,255:38:28,10:1.60505e-19,1,6.20086e-27:127
chr1 201209738 . A G 148.287 . DP=246;VDB=0.212968;SGB=-0.693147;RPBZ=0.999561;MQBZ=-4.13408;MQSBZ=2.76703;BQBZ=0.718089;NMBZ=7.53256;SCBZ=2.22265;FS=0;MQ0F=0.0203252;AC=1;AN=2;DP4=140,31,59,10;MQ=51;ANNOVAR_DATE=2020-06-08;Func.refGene=exonic;Gene.refGene=IGFN1;GeneDetail.refGene=.;ExonicFunc.refGene=synonymous_SNV;AAChange.refGene=IGFN1:NM_001164586:exon12:c.A4845G:p.G1615G;ALLELE_END;AF=0.0758308 GT:PL:DP:AD:GP:GQ 0/1:182,0,255:240:171,64:8.91751e-19,1,5.59366e-27:127
chr1 201209776 . A G 144.974 . DP=246;VDB=0.981223;SGB=-0.693147;RPBZ=-1.80167;MQBZ=-4.61034;MQSBZ=0.301843;BQBZ=0.822387;NMBZ=5.62233;SCBZ=-0.656655;FS=0;MQ0F=0.0772358;AC=1;AN=2;DP4=111,28,90,3;MQ=35;ANNOVAR_DATE=2020-06-08;Func.refGene=exonic;Gene.refGene=IGFN1;GeneDetail.refGene=.;ExonicFunc.refGene=nonsynonymous_SNV;AAChange.refGene=IGFN1:NM_001164586:exon12:c.A4883G:p.E1628G;ALLELE_END;AF=0.0860138 GT:PL:DP:AD:GP:GQ 0/1:178,0,255:232:139,89:1.50686e-18,1,8.31511e-27:127
chr1 201209796 . A G 176.111 . DP=244;VDB=0.004529;SGB=-0.693147;RPBZ=-6.06726;MQBZ=-5.22149;MQSBZ=1.2987;BQBZ=0.536083;NMBZ=4.28046;SCBZ=-2.26132;FS=0;MQ0F=0.0860656;AC=1;AN=2;DP4=98,29,100,4;MQ=32;ANNOVAR_DATE=2020-06-08;Func.refGene=exonic;Gene.refGene=IGFN1;GeneDetail.refGene=.;ExonicFunc.refGene=nonsynonymous_SNV;AAChange.refGene=IGFN1:NM_001164586:exon12:c.A4903G:p.K1635E;ALLELE_END;AF=0.226855 GT:PL:DP:AD:GP:GQ 0/1:209,0,255:231:127,104:1.06277e-21,1,9.36485e-27:127
chr1 201225142 . G T 18.2553 . DP=8;VDB=0.02;SGB=-0.453602;RPBZ=2.02424;MQBZ=0;MQSBZ=0;BQBZ=0.57735;NMBZ=0;SCBZ=0;FS=0;MQ0F=0;AC=1;AN=2;DP4=1,5,0,2;MQ=60;ANNOVAR_DATE=2020-06-08;Func.refGene=intronic;Gene.refGene=IGFN1;GeneDetail.refGene=.;ExonicFunc.refGene=.;AAChange.refGene=.;ALLELE_END GT:PL:DP:AD:GP:GQ 0/1:52,0,164:8:6,2:9.31038e-06,0.999991,6.74473e-18:50
Thank you very much, Samuel
Why don't you use VEP annotation and filtering?
I tried my best to make it work with a bash command, but I didn't find anyw ay to do it : could you advise me? I am new to genomics.
Thank you