Convert from Protein position to ENSEMBL Transcript's coding sequence position
2
0
Entering edit mode
8.6 years ago

Hi, Firstly, I am a Computer Science Engineer and a newbie in Bioinformatics.

I have Human mutation data in the following format: UniProt ID, Position, Original Amino Acid, New Amino Acid

I am trying to match this with ExAC data which contains: ENSEMBL Transcript ID, ..., CDS_Start, CDS_End, number of coding base pairs, ... etc

I have the UniProtKB ID to ENSEMBL Transcript ID mapping from one of the sites. But I am unable to map the positions.

How do I convert Protein Position to Transcript's CDS_start and CDS_end (beginning and end of Transcript's coding sequence positions?

I request your help.

Thank you, Rashmi

protein ExAC UniProt CDS Ensembl • 3.2k views
ADD COMMENT
1
Entering edit mode
8.6 years ago
Denise CS ★ 5.2k

The string 'original amino acid, position, new amino acid' looks like HGVS notation to me. You can use the Ensembl VEP for your variants and find out what their cDNA and CDS positions are. In addition the VEP will tell you whether the variants are observed in ExAC, 1000Genomes and ESP, plus clinical significance and plenty more.

ADD COMMENT
0
Entering edit mode
8.6 years ago

You can do this using the Transcript object from the Ensembl API. It has coding_region_start and coding_region_end methods. Or if that doesn't work for you, you can also use the TranscriptMapper object.

ADD COMMENT

Login before adding your answer.

Traffic: 2161 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6