Question

In VEP annotation, how is the codon field interpreted?

0

Entering edit mode

14 months ago

Atefeh ▴ 10

After annotating with VEP a VCF file, we obtain different fields. One of them is called Codons which represents the affected codon in the transcript of the gene. Below is a screenshot of Insertions from a sample:

    HGVSp_Short   RefSeq      Codons
11  p.A843Gfs*12      NM_001110556.2      gct/gGct
158 p.Q694Pfs*23      NM_052897.4     -/C

Variants and INDELs affect genes differently depending on the transcript we are looking at. My question is, for some notations for Insertions like in the screenshot, how can we interpret these results? Let's take the example of -/C. Is C inserted at the end of the previous codon? or the next codon of the transcript?

ensembl codon vep protein • 1.5k views

ADD COMMENT • link 14 months ago by Atefeh ▴ 10

0

Entering edit mode

Can you also show us the HGVSc changes for these records please?

ADD REPLY • link 14 months ago by Ram 44k

0

Entering edit mode

HGVSc for those are:

11     c.2527dup
158   c.2080dup.

ADD REPLY • link updated 14 months ago by Ram 44k • written 14 months ago by Atefeh ▴ 10

0

Entering edit mode

I can't really interpret what's going on. I usually only look at codon changes for coding SNVs/MNVs, indel codon annotation looks quite puzzling. Maybe someone from Ensembl can help.

ADD REPLY • link 14 months ago by Ram 44k

Ram · Answer 1 · 2023-11-09

1

Entering edit mode

14 months ago

A@Ensembl ▴ 30

Hi

When I annotated the variant MBD6:c.2080dup I get the following codons CAG/CCAG. These codons make more sense.

My VEP job: (https://www.ensembl.org/Homo_sapiens/Tools/VEP/Ticket?tl=57dgNnQE69xEm7Hs).

Please would you be able to share how you have annotated the variants.

Best wishes,

Aleena
Ensembl

ADD COMMENT • link updated 14 months ago by Ram 44k • written 14 months ago by A@Ensembl ▴ 30

0

Entering edit mode

I think OP ran against RefSeq, not EnsEMBL. Their data has been processed by some tool downstream of VEP (probably vcf2maf) - VEP does not give you the HGVSp_Short field. See: https://github.com/mskcc/vcf2maf/blob/main/docs/vep_maf_readme.txt

The problem is probably not with VEP but with whatever was done after. Even with RefSeq database, VEP returns proper Codon annotation: https://www.ensembl.org/Homo_sapiens/Tools/VEP/Ticket?tl=Lk885HixzcK8OlKR

ADD REPLY • link 14 months ago by Ram 44k

0

Entering edit mode

Hi Aleena

Thanks for your attention, I did not annotation by myself, I am using prepared MAF files in TCGA. There are in Frame Shift Insertion mutation. So it was unclear for me where should I put C?

in second record 694th codon is CAG(Q): which one is correct?

...Ccag...
...cCag...
...caCg...

!!

ADD REPLY • link updated 14 months ago by Ram 44k • written 14 months ago by Atefeh ▴ 10

1

Entering edit mode

Duplicate the base at the 2080th position - that's what the HGVSc says. Aleena already showed you that it should be CAG > CCAG. Also there is a logical way to do this once you have the cDNA position (2080): Divide it by 3 and get the remainder. If it's 0, you're looking at the last base of the codon. If it's 1, you're looking at the first base and if it's 2, the second base. 2080 % 3 is 1, so you're looking at the first base of the codon which is Cag. Duplicate that and you get CCag. You cannot tell if it's cCag or Ccag because how do you differentiate which base got inserted? All Cs look alike. HGVS convention says that we should pick the most 3' base, so even if the sequence before the mutation was TCCCAG and it then became TCCCCAG owing to the mutation, you pick the insertion point as the last C (tccCCag), not any other C before it.