Why there are some discrepancies between VEP output and "variation" section of Ensembl?!
2
1
Entering edit mode
6.4 years ago
seta ★ 1.9k

Hi all,

I used VEP from Ensembl to annotate a list of SNP derived from whole genome sequencing of a population. When I examined some of them, I found there are some differences between VEP output and what "variation" section of ensembl showed us. For example, the strand for rs104895094 determined minus (-1) while it is forward at enter link description here. Also, VEP used "A" as an alternative allele to calculate the consequence while the minor allele is C for this SNP at enter link description here. Could you please tell me what is the story?

Sorry, please kindly let me know is it right to say variant allele, alternative allele, and the minor allele is the same when AF is < 0.5?

Many thanks

SNP VEP output alternative allele Ensembl • 2.7k views
ADD COMMENT
0
Entering edit mode

1) Regarding discrepancy about strand information between dbSNP (rs) and Ensembl - Gene is on reverse strand and variant location is represented on forward strand (on ensembl). dbSNP (https://www.ncbi.nlm.nih.gov/snp/rs104895094) lists gene on reverse strand.

2) VEP uses the allele provided by user input. However, for a variant (existing on ensembl), it provides consequences for all the alleles. In this case A and C. Click on HGVS names for this variant and you would see consequence calculation for both alleles and there are about 46 variant HGVS representations for this variant. ( http://asia.ensembl.org/Homo_sapiens/Variation/Explore?r=16:3242903-3243903;v=rs104895094;vdb=variation;vf=18790831)

3) As per VEP annotation of your data, we would not know what happened if you do not furnish problematic variant information.

ADD REPLY
4
Entering edit mode
6.4 years ago
Emily 24k

OK, the strand thing. Variant alleles reported by Ensembl for an rsID are always the forward strand alleles. However, if you are looking at variant consequences, and the variant hits a transcript on the negative strand, the relevant alleles are the reverse strand alleles. For this reason, the VEP will report which strand the transcript falls on in the output. It will also give useful stuff like the codon and amino changes, which will reflect the strand that the transcript falls on. This strand column only gives information about the transcript, not the variant. In your input, you can choose to put your alleles on the reverse strand and indicate your strand. You have got -1 for strand in your output because the gene it hits, MEFV, is negative stranded. http://www.ensembl.org/Homo_sapiens/Gene/Variation_Gene/Table?db=core;g=ENSG00000103313;r=16:3242028-3256627;source=dbSNP;v=rs104895094;vdb=variation;vf=18790831

The reference allele is whatever occurs in the reference genome, in this case T. The alternative alleles are all the other alleles that have been observed at this locus, in this case A and C. The major allele is the most common allele (T), the minor allele is the second most common allele (C) and all other alleles are referred to as rare alleles (A).

The VEP will report consequences depending on what your input it. If your input is a VCF or Ensembl default format file, which includes the variant alleles, the VEP will report consequences on those alleles. It will do this even if you put a novel allele in, it will report what happens with that allele. If, however, you use a list of rsIDs as input, it will give you consequences for all the alleles that it knows about, in this case both C and A, as it doesn't know which one you want to know about.

ADD COMMENT
0
Entering edit mode

Thank you very much for your helpful info.

ADD REPLY
2
Entering edit mode
6.4 years ago
Denise CS ★ 5.2k

When I see variant allele I read the allele at the given variant locus. Variant allele can be reference, ancestral, derived, alternate, transcript allele, wild type, etc. The alternate allele is the allele not observed on the reference assembly (GRCh38 for example). The later is also known as reference allele.

The minor allele varies depending on the population. Central Asians may have minor allele A, whereas Native Americans will have C, for example. There is global minor allele frequency (MAF) across all human populations.

VEP will compute the consequence by comparing the alternate allele (s) versus the reference allele. If you input your information as coordinates (rather than rsIDs or other input formats when you run VEP), the reference allele will always go first. For your example the reference allele is T. VEP will not "choose" minor versus major (in population frequency, as again this varies) alleles. Rather, VEP will compare reference versus non-reference (i.e. alternate).

Ensembl "always" reports the allele on the forward strand. There should not be discrepancies between what it is shown in the variation page, or what the VEP returns, as they are all Ensembl (browser and VEP).

In your example, the reference allele on the forward strand is T, whereas the alternate alleles (on the forward strand) are A and C.

Check the Explore this variant for more help.

ADD COMMENT
0
Entering edit mode

Thank you for all the comments. Could you please kindly tell me how I can obtain the minor allele in various populations, say Asian, European, etc?

Regarding the allele and strand, STRAND defined as the DNA strand (1 or -1) on which the transcript/feature lies at enter link description here. In my VEP output, the strand for rs104895094 was determined as -1. Here as an example,

variation Location Allele-VEP strand REF ALT

rs104895094 16:3243403 A -1 T C

Here, REF and ALT is determined by analysis of my whole genome sequencing data. As forward strand has always sequenced in genome sequencing like my case, I expected that Allele reported by VEP and used to calculate the consequence (Allele-VEP) would be G (if the strand is -1 as shown above), not A. Could you please clear me about this issue?

ADD REPLY
0
Entering edit mode

Refer to kg, kgprod, G5 and G5A tags in dbsnp vcf.@seta

ADD REPLY

Login before adding your answer.

Traffic: 1676 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6