Question

How to convert Human Ensembl transcript coordinates to GRCh38

0

Entering edit mode

4.5 years ago

Alexandra • 0

Hello, I would like to convert Human Ensembl transcript coordinates to GRCh38. I tried to use R package "ensembldb 2.13.1", but it uses the old version of database (EnsDb.Hsapiens.v86) and is not suitable for my data from Ensemble release 100. Сould advise me some tool for this task?

R python Ensembl transcript conversion • 3.5k views

ADD COMMENT • link updated 4.5 years ago by i.sudbery 20k • written 4.5 years ago by Alexandra • 0

0

Entering edit mode

Ensembl transcripts should be using the latest genome build. Can you provide examples of what you have that you want to convert?

ADD REPLY • link 4.5 years ago by GenoMax 148k

0

Entering edit mode

Now I have a csv-file like this:

"ENST","Gene_name","Position"
"ENST00000000233.10","ARF5",133
"ENST00000000233.10","ARF5",145
"ENST00000000233.10","ARF5",82
"ENST00000000412.8","M6PR",153
"ENST00000000442.11","ESRRA",175

and I want to convert this to genome coordinates

ADD REPLY • link updated 4.5 years ago by GenoMax 148k • written 4.5 years ago by Alexandra • 0

0

Entering edit mode

Use BioMart https://www.ensembl.org/biomart/martview

ADD REPLY • link 4.5 years ago by JC 13k

0

Entering edit mode

Could you please briefly explain me how to use BioMart for convertation? I previously used it just to export data.

ADD REPLY • link 4.5 years ago by Alexandra • 0

0

Entering edit mode

If you have used BioMart before just cut the first column of your ID's and use that to restrict your search.

ADD REPLY • link 4.5 years ago by GenoMax 148k

0

Entering edit mode

Sorry if my questions seem strange, I'm new to bioinformatics. I need to convert exact position in transcript (e.g. position 133 in ENST00000000233.10) to genome coordinate . I do not need the genomic coordinates of the entire transcript.

ADD REPLY • link 4.5 years ago by Alexandra • 0

0

Entering edit mode

That you may need to do yourself and may require writing some custom code. Methods mentioned here will give you the genomics co-ordinates of the entire transcript.

ADD REPLY • link 4.5 years ago by GenoMax 148k

0

Entering edit mode

Thenk you for your help!

ADD REPLY • link 4.5 years ago by Alexandra • 0

0

Entering edit mode

BioMart would be simpler. Programmatically use REST API: https://rest.ensembl.org/lookup/id/ENST00000000233?content-type=application/json;expand=1

ADD REPLY • link 4.5 years ago by GenoMax 148k

0

Entering edit mode

I have python code to do this if it is of any help.

ADD REPLY • link 4.5 years ago by i.sudbery 20k

0

Entering edit mode

Put it in a GitHub gist and link as a answer.

ADD REPLY • link 4.5 years ago by GenoMax 148k

0

Entering edit mode

I've made my own code already, but it would be great to see the code of a more experienced user, please share it.

ADD REPLY • link 4.5 years ago by Alexandra • 0

score 0 · Answer 1 · 2020-05-24

Sorry, thought I had posted this yesterday...

The gist is here:

It depends oncgat as well as pandas and numpy. You use it like so:

 from cgat import GTF
 coord_tobe_translated = pandas.read_csv("mycoords.tsv")
 coord_tobe_translated.set_index("ENST")
 for transcript in GTF.transcript_iterator(GTF.iterator(open("my_gtf.gtf"))):
    converter = TranscriptCoordInterconverter(transcript)
    this_transcript_coords = coord_to_be_converted.loc[transcript[0].transcript_id]
    genome_coords = converter.transcript2genome(this_transcript_coords.position)
    for pos in genome_coords:
         print transcript[0].transcript_id, pos

Its a big rusty, written years ago, in python 2.7, but you get the idea. One of these days I'll get round to packing it up as a proper utility. . Presented "as is". No guarentees implied. Caveat emptor.