Question

Ensembl VEP gnomAD annotated allele frequencies different from gnomAD browser

0

Entering edit mode

3.6 years ago

KitScorpion ▴ 10

I've annotated some variants using VEP, and was looking at the minor allele frequencies. Some of the variants had very different MAFs in the annotation than I expected (I expected MAF < 1%, whereas some annotated MAFs were >50%). I looked up the same variants on the gnomAD v3 browser, and all the ones I've checked had MAFs much more in line with my expectation, thus being very different from the MAF annotated using VEP. One example: 19:35768033:C:T was annotated with a MAF of 36% (NFE), whereas gnomAD v3.1.2 lists the NFE MAF as 0.0456%.

I ran the annotation using the following command:

vep --input_file input.vcf \
--output_file anno.tab \
--format vcf \
--tab --symbol --hgvs --tsl \
--terms SO \
--fasta Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz \
--offline \
--cache --dir_cache  /anno_cache \
--plugin CADD,whole_genome_SNVs.tsv.gz,gnomad.genomes.r3.0.indel.tsv.gz \
--af_gnomad gnomAD_NFE_AF

Edit: Another example. 1:1719393:A:G, MAF_NFE according to gnomAD: 0.0. Here's the corresponding lines in the annotation:

#Uploaded_variation Location    Allele  Gene    Feature Feature_type    Consequence cDNA_position   CDS_position    Protein_position    Amino_acids Codons  Existing_variation  IMPACT  DISTANCE    STRAND  FLAGS   SYMBOL  SYMBOL_SOURCE   HGNC_ID TSL HGVSc   HGVSp   HGVS_OFFSET gnomAD_AF   gnomAD_AFR_AF   gnomAD_AMR_AF   gnomAD_ASJ_AF   gnomAD_EAS_AF   gnomAD_FIN_AF   gnomAD_NFE_AF   gnomAD_OTH_AF   gnomAD_SAS_AF   CLIN_SIG    SOMATIC PHENO   CADD_PHRED  CADD_RAW
1:1719393:A:G   1:1719393   G   ENSG00000008128 ENST00000356200 Transcript  missense_variant    423 188 63  V/A gTt/gCt rs72909030,COSV62264367 MODERATE    -   -1  -   CDK11A  HGNC    HGNC:1730   5   ENST00000356200.7:c.188T>C  ENSP00000348529.2:p.Val63Ala    -   0.4984  0.4963  0.4969  0.4982  0.4964  0.5 0.4992  0.4994  0.498   -   0,1 0,1 18.49   1.900858
1:1719393:A:G   1:1719393   G   ENSG00000008128 ENST00000356937 Transcript  non_coding_transcript_exon_variant  123 -   -   -   -   rs72909030,COSV62264367 MODIFIER    -   -1  -   CDK11A  HGNC    HGNC:1730   1   ENST00000356937.7:n.123T>C  -   -   0.4984  0.4963  0.4969  0.4982  0.4964  0.5 0.4992  0.4994  0.498   -   0,1 0,1 18.49   1.900858
1:1719393:A:G   1:1719393   G   ENSG00000008128 ENST00000357760 Transcript  missense_variant    370 290 97  V/A gTt/gCt rs72909030,COSV62264367 MODERATE    -   -1  -   CDK11A  HGNC    HGNC:1730   1   ENST00000357760.6:c.290T>C  ENSP00000350403.2:p.Val97Ala    -   0.4984  0.4963  0.4969  0.4982  0.4964  0.5 0.4992  0.4994  0.498   -   0,1 0,1 18.49   1.900858
1:1719393:A:G   1:1719393   G   ENSG00000008128 ENST00000358779 Transcript  missense_variant    370 290 97  V/A gTt/gCt rs72909030,COSV62264367 MODERATE    -   -1  -   CDK11A  HGNC    HGNC:1730   1   ENST00000358779.9:c.290T>C  ENSP00000351629.5:p.Val97Ala    -   0.4984  0.4963  0.4969  0.4982  0.4964  0.5 0.4992  0.4994  0.498   -   0,1 0,1 18.49   1.900858
1:1719393:A:G   1:1719393   G   ENSG00000008128 ENST00000378633 Transcript  missense_variant    370 290 97  V/A gTt/gCt rs72909030,COSV62264367 MODERATE    -   -1  -   CDK11A  HGNC    HGNC:1730   1   ENST00000378633.5:c.290T>C  ENSP00000367900.1:p.Val97Ala    -   0.4984  0.4963  0.4969  0.4982  0.4964  0.5 0.4992  0.4994  0.498   -   0,1 0,1 18.49   1.900858
1:1719393:A:G   1:1719393   G   ENSG00000008128 ENST00000378638 Transcript  missense_variant    350 188 63  V/A gTt/gCt rs72909030,COSV62264367 MODERATE    -   -1  -   CDK11A  HGNC    HGNC:1730   5   ENST00000378638.6:c.188T>C  ENSP00000367905.1:p.Val63Ala    -   0.4984  0.4963  0.4969  0.4982  0.4964  0.5 0.4992  0.4994  0.498   -   0,1 0,1 18.49   1.900858
1:1719393:A:G   1:1719393   G   ENSG00000008128 ENST00000401096 Transcript  downstream_gene_variant -   -   -   -   -   rs72909030,COSV62264367 MODIFIER    3315    -1  cds_end_NF  CDK11A  HGNC    HGNC:1730   5   -   -   -   0.4984  0.4963  0.4969  0.4982  0.4964  0.5 0.4992  0.4994  0.498   -   0,1 0,1 18.49   1.900858
1:1719393:A:G   1:1719393   G   ENSG00000008128 ENST00000404249 Transcript  missense_variant    403 290 97  V/A gTt/gCt rs72909030,COSV62264367 MODERATE    -   -1  -   CDK11A  HGNC    HGNC:1730   1   ENST00000404249.8:c.290T>C  ENSP00000384442.3:p.Val97Ala    -   0.4984  0.4963  0.4969  0.4982  0.4964  0.5 0.4992  0.4994  0.498   -   0,1 0,1 18.49   1.900858
1:1719393:A:G   1:1719393   G   ENSG00000008128 ENST00000460465 Transcript  missense_variant,NMD_transcript_variant 370 290 97  V/A gTt/gCt rs72909030,COSV62264367 MODERATE    -   -1  -   CDK11A  HGNC    HGNC:1730   1   ENST00000460465.5:c.290T>C  ENSP00000462289.1:p.Val97Ala    -   0.4984  0.4963  0.4969  0.4982  0.4964  0.5 0.4992  0.4994  0.498   -   0,1 0,1 18.49   1.900858
1:1719393:A:G   1:1719393   G   ENSG00000008128 ENST00000479362 Transcript  missense_variant    536 290 97  V/A gTt/gCt rs72909030,COSV62264367 MODERATE    -   -1  cds_end_NF  CDK11A  HGNC    HGNC:1730   1   ENST00000479362.1:c.290T>C  ENSP00000423900.1:p.Val97Ala    -   0.4984  0.4963  0.4969  0.4982  0.4964  0.5 0.4992  0.4994  0.498   -   0,1 0,1 18.49   1.900858
1:1719393:A:G   1:1719393   G   ENSG00000008128 ENST00000487462 Transcript  downstream_gene_variant -   -   -   -   -   rs72909030,COSV62264367 MODIFIER    3073    -1  -   CDK11A  HGNC    HGNC:1730   5   -   -   -   0.4984  0.4963  0.4969  0.4982  0.4964  0.5 0.4992  0.4994  0.498   -   0,1 0,1 18.49   1.900858
1:1719393:A:G   1:1719393   G   ENSG00000008128 ENST00000498810 Transcript  non_coding_transcript_exon_variant  347 -   -   -   -   rs72909030,COSV62264367 MODIFIER    -   -1  -   CDK11A  HGNC    HGNC:1730   2   ENST00000498810.1:n.347T>C  -   -   0.4984  0.4963  0.4969  0.4982  0.4964  0.5 0.4992  0.4994  0.498   -   0,1 0,1 18.49   1.900858
1:1719393:A:G   1:1719393   G   ENSG00000008128 ENST00000509982 Transcript  missense_variant,NMD_transcript_variant 304 290 97  V/A gTt/gCt rs72909030,COSV62264367 MODERATE    -   -1  -   CDK11A  HGNC    HGNC:1730   5   ENST00000509982.5:c.290T>C  ENSP00000422149.1:p.Val97Ala    -   0.4984  0.4963  0.4969  0.4982  0.4964  0.5 0.4992  0.4994  0.498   -   0,1 0,1 18.49   1.900858
1:1719393:A:G   1:1719393   G   ENSG00000268575 ENST00000598846 Transcript  non_coding_transcript_exon_variant  3054    -   -   -   -   rs72909030,COSV62264367 MODIFIER    -   -1  -   -   -   -   2   ENST00000598846.1:n.3054T>C -   -   0.4984  0.4963  0.4969  0.4982  0.4964  0.5 0.4992  0.4994  0.498   -   0,1 0,1 18.49   1.900858

Is there an explanation for these discrepancies? Am I making a mistake in my annotation and if so, might the other data fields (particularly gene and consequence) be affected?

gnomad vep annotation ensembl • 1.4k views

ADD COMMENT • link updated 3.6 years ago by Ben Moore ★ 2.4k • written 3.6 years ago by KitScorpion ▴ 10

0

Entering edit mode

The VEP includes gnomAD r2.1.1 exomes only, so if you want gnomAD v3. To include gnomAD v3 data in the VEP output you should use custom annotation: https://www.ensembl.org/info/docs/tools/vep/script/vep_custom.html

The frequencies returned by VEP are correct, in your example 1:1719393:A:G has frequency 0.4992 (v2 exomes only) while in v3.1.2 is 0.0 (genomes).

I'm not sure why there is such a difference in frequency, but this could be due to the region coverage. The gnomAD documentation explains a little about allele frequency differences between the datasets:

"Therefore gnomAD v2 is still our recommended dataset for most coding regions analyses. However, gnomAD v3.1 represents a very large increase in the number of genomes, and will therefore be a much better resource if your primary interest is in non-coding regions or if your coding region of interest is poorly captured in the gnomAD exome"

ADD REPLY • link 3.6 years ago by Ben Moore ★ 2.4k