I am given a VCF file to analyze. In the last field, occasionally is some rows I see: AF1000G=x
where x
is a positive number less than 1
. (see following for as an example)
What does this mean??
Generally, where I should find more information about the meaning of different things/annotations in a VCF file?
PLEASE NOTE that: I've already looked into different references/manuals on VCF files here, here, here, here, and here, but none of them talks about details and the related biological meanings (e.g. none talks about AF1000G=x
)
chr1 2643652 rs200640386 C CG . PASS SOMATIC;QSI=48;TQSI=1;NT=ref;QSI_NT=48;TQSI_NT=1;SGT=ref->het;MQ=60.00;MQ0=0;RU=G;RC=8;IC=9;IHP=8;SomaticEVS=16.79;AF1000G=0.022764;CSQT=1|TTC34|ENST00000401095.7|intron_variant,1|TTC34|NM_001242672.1|intron_variant;CSQR=1|ENSR00001740603|regulatory_region_variant DP:DP2:TAR:TIR:TOR:DP50:FDP50:SUBDP50:BCN50 28:28:26,26:0,0:0,0:27.79:2.08:0.00:0.07 86:86:36,36:39,43:12,10:82.16:2.39:0.00:0.02
This must be explained in the header of the VCF. I assume it is the allele frequency of the ALT allele in the 1000Genomes project, which was a whole-genome-sequencing project in different human populations from a few years ago, aiming to find common human variation.
see here: https://support.illumina.com/help/BaseSpace_App_WGS_v5_OLH_15050955_02/Content/Source/Informatics/Apps/IAE_app.htm (first hit on DDG)
Thank you! Yes; it says so in the header:
What can be inferred about the variation in the corresponding row of the VCF file?
If the float number is like 0.95, then the ALT allele is present in 95% of all the samples sequenced in the 1KG project. That is then a very common variant. Depends on you what you make out of this information. If you want to explain or connect a disease with such a mutation, you will have a hard time convincing people that this variant is causal, rather than one that has an AF of 0.00001 or even 0 in a healthy human population.