Using VEP to get gnomAD frequencies
2
3
Entering edit mode
5.9 years ago

Hi all,

I am using Ensembl VEP (command line) to annotate a VCF I have. I am specifically looking for gnomAD allele frequencies, which is fairly straight forward to do, technically speaking. However, the data looks off in some cases.

For example, when I pass in:

10  69408929    COSM3751912 A   T   .   .   GENE=TACR2;STRAND=-;CDS=c.734T>A;AA=p.M245K

I get the VEP output:

COSM3751912 10:69408929 T   ENSG00000075073 ENST00000373306 Transcript  missense_variant    1278    734 245 M/K aTg/aAg rs55953810  MODERATE    -   -1  -   TACR2   HGNC    HGNC:11527  YES ENSP00000362403 P21452  -   UPI0000061EE3   -   3/5 -   Gene3D:1.20.1070.10,Pfam_domain:PF00001,PROSITE_profiles:PS50262,hmmpanther:PTHR43919,hmmpanther:PTHR43919:SF4,SMART_domains:SM01381,Superfamily_domains:SSF81321,Conserved_Domains:cd16004 1   1   1   1   1   0.9999  0.9999  0.9999  1   0.9999  1   0.9999  1   1   1   gnomAD_ASJ,gnomAD_FIN,gnomAD_OTH,gnomAD_SAS,AFR,AMR,EAS,EUR,SAS -   -   -   -   -   -   -

Jumbling through that, you can see the allele frequencies for gnomAD_AF is 0.9999. This seems odd to me. How could this variant be a COSMIC (cancer database) missense variant, with MODERATE consequence, and have 99.99% frequency. I'm lost on how to interpret this.

Maybe I am misunderstanding how gnomAD scores allele frequencies, hence posting this question here.

Does anyone know how gnomAD allele frequencies (as outputted by Ensembl's VEP) should be interpreted?

annotation frequency allele frequency vep gnomAD • 7.6k views
ADD COMMENT
0
Entering edit mode

For some reason, it's giving you the 1-AF and not the AF. The variant at the chromosome level is an A>T with AF < 0.01, but at the transcript level becomes a T>A. Maybe that's the reason the AF is being inverted?

ADD REPLY
0
Entering edit mode

Interesting. Could that be because the strand is -1?

ADD REPLY
0
Entering edit mode

Answered my own question by looking at more data. Nope, there are plenty of +1 strand entries with the same frequency issue as the first example.

ADD REPLY
0
Entering edit mode

there is no variant in gnomad at this position: http://gnomad.broadinstitute.org/variant/10-69408929-A-T

ADD REPLY
0
Entering edit mode

I did notice that as well. So why would VEP be returning anything then? Seems odd (again). There are other entries with a - in place of the AF (I'm assuming that means no data).

ADD REPLY
0
Entering edit mode

Can it be used to annotate a vcd file?

ADD REPLY
1
Entering edit mode

Why is this added as an answer? Please delete it and add a comment/reply to an appropriate post.

EDIT: After nearly 2 years of no activity from Ahmed, I've moved this post to a comment on the top level post.

ADD REPLY
2
Entering edit mode
5.9 years ago
Emily 24k

The reference allele, in this case, appears to be a very rare allele. The reference allele is whatever was found in the reference genome, which is a genomic region of a real person, which means that it can be a rare or private allele. In this case, the reference, A, is very rare.

The VEP is giving you the allele frequency of the alternative allele for the variant at this locus, which is rs55953810. The alternative allele given in your VEP input is T, so the allele frequency it gives you is for T.

VEP reads VCF format as standard, assuming that the alleles are the forward strand alleles. Your specification of strand in the INFO column is ignored because this is not the standard way to write VCF. This is why your alleles have not been converted.

It appears that you have run your VEP input against GRCh38. gnomAD coordinates are on GRCh37, and are remapped onto GRCh38, so can be looked up by the VEP to find the allele frequencies. This is why there is no variant at that locus in gnomAD, but that variant identifier and its frequencies do exist.

ADD COMMENT
0
Entering edit mode

This is a big help in understanding whats going on. Thank you. Would you recommend converting the VCF to have (+) strand alleles only? As a side note, this VCF is directly downloaded from cosmic, so I suppose it makes sense that the "ref" alleles might be super rare.

ADD REPLY
0
Entering edit mode

Yes, I think that standard convention states that VCF should have only forward strand alleles. Perhaps this is something to raise with COSMIC if they are putting reverse strand alleles in their VCFs.

ADD REPLY
0
Entering edit mode

Hi Emily_Ensembl ,

I am trying to annotate my VCF file (already annotated using snpEff) using VEP for gnomad. The resulted annotation files however do not have any information for allele frequency from Gnomad. The columns are there but are all empty. I would be thankful if you could let me know what I should do to get it right. The VCF files are aligned on hg38.

Below are some codes that I used:

  • using the gnomad data (only exome) from VEP cash

    ./vep --cache --force_overwrite -i input.snpEff.vcf \
    --output_file annotated_1.vcf --stats_file annotated_1.summary.html \
    --cache --species homo_sapiens --af_gnomad --max_af \ 
    --vcf_info_field --pubmed --vcf
    
  • I downloaded the liftover exome and genome data from gnomad website

    ./vep --cache --force_overwrite -i input.snpEff.vcf \ 
    --output_file annotated_2.vcf --stats_file annotated_2.summary.html \
    --cache --species homo_sapiens --custom /home/ubuntu/databases/gnomad_v2/gnomad.genomes.r2.1.1.sites.liftover_grch38.vcf.bgz,gnomADg,vcf,exact,0,AF_AFR,AF_AMR,AF_ASJ,AF_EAS,AF_FIN,AF_NFE,AF_OTH \
    --custom /home/ubuntu/workspace/databases/gnomad_v2/gnomad.exomes.r2.1.1.sites.liftover_grch38.vcf.bgz,gnomADg,vcf,exact,0,AF_AFR,AF_AMR,AF_ASJ,AF_EAS,AF_FIN,AF_NFE,AF_OTH \
    --vcf_info_field --pubmed --vcf
    
ADD REPLY
1
Entering edit mode

Do not add answers unless you're answering the top level question. Use Add Comment or Add Reply as appropriate instead. I've moved your post to the right spot this time, but please be more careful in the future.

ADD REPLY
1
Entering edit mode

Can you please send a sample of your input and output files to helpdesk@ensembl.org

ADD REPLY
0
Entering edit mode

Thanks! I just sent you the email.

ADD REPLY
1
Entering edit mode
3.3 years ago
Kalin ▴ 50

I created a python package based on SQLite databases, where you can easily query all gnomAD variants for GRCh37/38, without having to install VEP. https://github.com/KalinNonchev/gnomAD_MAF I have precomputed SQLite databases for gnomAD WGS for GRCh37/38 in the description of the package. Please take a look there.

Best,

ADD COMMENT

Login before adding your answer.

Traffic: 2047 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6