Question

Convert from Protein position to ENSEMBL Transcript's coding sequence position

0

Entering edit mode

8.6 years ago

rashmi.bangalore123 • 0

Hi, Firstly, I am a Computer Science Engineer and a newbie in Bioinformatics.

I have Human mutation data in the following format: UniProt ID, Position, Original Amino Acid, New Amino Acid

I am trying to match this with ExAC data which contains: ENSEMBL Transcript ID, ..., CDS_Start, CDS_End, number of coding base pairs, ... etc

I have the UniProtKB ID to ENSEMBL Transcript ID mapping from one of the sites. But I am unable to map the positions.

How do I convert Protein Position to Transcript's CDS_start and CDS_end (beginning and end of Transcript's coding sequence positions?

I request your help.

Thank you, Rashmi

protein ExAC UniProt CDS Ensembl • 3.2k views

ADD COMMENT • link updated 8.6 years ago by Emily 24k • written 8.6 years ago by rashmi.bangalore123 • 0

score 1 · Answer 1 · 2016-05-12

The string 'original amino acid, position, new amino acid' looks like HGVS notation to me. You can use the Ensembl VEP for your variants and find out what their cDNA and CDS positions are. In addition the VEP will tell you whether the variants are observed in ExAC, 1000Genomes and ESP, plus clinical significance and plenty more.

score 0 · Answer 2 · 2016-05-12

0

Entering edit mode

8.6 years ago

Jean-Karim Heriche 27k

You can do this using the Transcript object from the Ensembl API. It has coding_region_start and coding_region_end methods. Or if that doesn't work for you, you can also use the TranscriptMapper object.

ADD COMMENT • link 8.6 years ago by Jean-Karim Heriche 27k