Ensembl Ref/alt allele discrepancy
2
0
Entering edit mode
17 months ago
patelk26 ▴ 320

Hello,

While exploring rs72552763 INDEL in Ensembl, the Alleles field show the ref and alt allele as ATGAT/AT. However, in the following line that provides location, the VCF shows ref and alt allele as ATGA A

Here's the screenshot for reference.

Why is there a discrepancy in ref/alt alleles? I am not sure if I am missing something obvious here. Any help to understand this discrepancy will be highly appreciated!

Thank you very much!

Ensembl INDEL dbSNP VCF variant • 1.1k views
ADD COMMENT
0
Entering edit mode
17 months ago
Ben Moore ★ 2.4k

Hi patelk26 - this is just two different ways of representing the inframe deletion of 'TGA' according to either the Ensembl default or VCF specifications: https://www.ensembl.org/info/docs/tools/vep/vep_formats.html#input

ADD COMMENT
0
Entering edit mode

Thank you for your prompt response. This is a naive question. I get confused because when I look at dbSNP record, it indicates the bases GAT are deleted, unless I am interpreting the record incorrectly. Could you please help clarify this? Thanks again!

enter image description here

ADD REPLY
0
Entering edit mode
17 months ago
Zhenyu Zhang ★ 1.2k

It's both an ATG deletion and a GAT deletion. This is essential a variant normalization issue. However, when GAT is used, the coordinate should change. So I assume 2nd report you posted is wrong.

ADD COMMENT
0
Entering edit mode

Thank you for your comment. I guess this add more to my confusion. Is it an TGA/GAT deletion like Ben_Ensembl suggested, or ATG deletion?

ADD REPLY
0
Entering edit mode

As Zhenyu Zhang said, this is a variant normalisation issue. For example, you could remove 'ATG', 'TGA' or 'GAT' from 'ATGAT' and be left with 'AT'. Different variant notation formats have different ways of dealing with this ambiguity, for example, VCF describes variants using their most 5’ representation, while HGVS format describes a variant at its most 3’ location.

ADD REPLY
0
Entering edit mode

@Ben_Ensembl: Searching for POS 160139849 in the reference panel on this link http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20220422_3202_phased_SNV_INDEL_SV/1kGP_high_coverage_Illumina.chr6.filtered.SNV_INDEL_SV_phased_panel.vcf.gz returns a variant with a POS of 160139848 and, REF and ALT as CATG and C, respectively. Is this the same variant?

ADD REPLY

Login before adding your answer.

Traffic: 1857 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6