the meaning of "AF1000G=..." in a VCF file
1
0
Entering edit mode
6.4 years ago
ruhollah ▴ 10

I am given a VCF file to analyze. In the last field, occasionally is some rows I see: AF1000G=x where x is a positive number less than 1. (see following for as an example)

What does this mean??

Generally, where I should find more information about the meaning of different things/annotations in a VCF file?

PLEASE NOTE that: I've already looked into different references/manuals on VCF files here, here, here, here, and here, but none of them talks about details and the related biological meanings (e.g. none talks about AF1000G=x)

chr1    2643652 rs200640386 C   CG  .   PASS    SOMATIC;QSI=48;TQSI=1;NT=ref;QSI_NT=48;TQSI_NT=1;SGT=ref->het;MQ=60.00;MQ0=0;RU=G;RC=8;IC=9;IHP=8;SomaticEVS=16.79;AF1000G=0.022764;CSQT=1|TTC34|ENST00000401095.7|intron_variant,1|TTC34|NM_001242672.1|intron_variant;CSQR=1|ENSR00001740603|regulatory_region_variant    DP:DP2:TAR:TIR:TOR:DP50:FDP50:SUBDP50:BCN50 28:28:26,26:0,0:0,0:27.79:2.08:0.00:0.07    86:86:36,36:39,43:12,10:82.16:2.39:0.00:0.02
genome • 1.6k views
ADD COMMENT
3
Entering edit mode

This must be explained in the header of the VCF. I assume it is the allele frequency of the ALT allele in the 1000Genomes project, which was a whole-genome-sequencing project in different human populations from a few years ago, aiming to find common human variation.

ADD REPLY
1
Entering edit mode

see here: https://support.illumina.com/help/BaseSpace_App_WGS_v5_OLH_15050955_02/Content/Source/Informatics/Apps/IAE_app.htm (first hit on DDG)

AF1000G The allele frequency from all populations of 1000 genomes data.

ADD REPLY
0
Entering edit mode

Thank you! Yes; it says so in the header:

##INFO=<ID=AF1000G,Number=A,Type=Float,Description="The allele frequency from all populations of 1000 genomes data">

What can be inferred about the variation in the corresponding row of the VCF file?

ADD REPLY
0
Entering edit mode

If the float number is like 0.95, then the ALT allele is present in 95% of all the samples sequenced in the 1KG project. That is then a very common variant. Depends on you what you make out of this information. If you want to explain or connect a disease with such a mutation, you will have a hard time convincing people that this variant is causal, rather than one that has an AF of 0.00001 or even 0 in a healthy human population.

ADD REPLY
1
Entering edit mode
6.4 years ago
GenoMax 147k

Have you looked in the VCF header to see if it is defined there? Looks like allele frequency from 1000G project, if I was to guess.

Edit: That looks correct.

AF1000G The allele frequency from all populations of 1000 genomes data.

ADD COMMENT
0
Entering edit mode

Thank you! Yes; it says so in the header:

##INFO=<ID=AF1000G,Number=A,Type=Float,Description="The allele frequency from all populations of 1000 genomes data">

What can be inferred about the variation in the corresponding row of the VCF file?

ADD REPLY

Login before adding your answer.

Traffic: 2153 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6