How to convert Human Ensembl transcript coordinates to GRCh38
1
0
Entering edit mode
4.5 years ago
Alexandra • 0

Hello, I would like to convert Human Ensembl transcript coordinates to GRCh38. I tried to use R package "ensembldb 2.13.1", but it uses the old version of database (EnsDb.Hsapiens.v86) and is not suitable for my data from Ensemble release 100. Сould advise me some tool for this task?

R python Ensembl transcript conversion • 3.5k views
ADD COMMENT
0
Entering edit mode

Ensembl transcripts should be using the latest genome build. Can you provide examples of what you have that you want to convert?

ADD REPLY
0
Entering edit mode

Now I have a csv-file like this:

"ENST","Gene_name","Position"
"ENST00000000233.10","ARF5",133
"ENST00000000233.10","ARF5",145
"ENST00000000233.10","ARF5",82
"ENST00000000412.8","M6PR",153
"ENST00000000442.11","ESRRA",175

and I want to convert this to genome coordinates

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Could you please briefly explain me how to use BioMart for convertation? I previously used it just to export data.

ADD REPLY
0
Entering edit mode

If you have used BioMart before just cut the first column of your ID's and use that to restrict your search.

ADD REPLY
0
Entering edit mode

Sorry if my questions seem strange, I'm new to bioinformatics. I need to convert exact position in transcript (e.g. position 133 in ENST00000000233.10) to genome coordinate . I do not need the genomic coordinates of the entire transcript.

ADD REPLY
0
Entering edit mode

That you may need to do yourself and may require writing some custom code. Methods mentioned here will give you the genomics co-ordinates of the entire transcript.

ADD REPLY
0
Entering edit mode

Thenk you for your help!

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

I have python code to do this if it is of any help.

ADD REPLY
0
Entering edit mode

Put it in a GitHub gist and link as a answer.

ADD REPLY
0
Entering edit mode

I've made my own code already, but it would be great to see the code of a more experienced user, please share it.

ADD REPLY
0
Entering edit mode
4.5 years ago

Sorry, thought I had posted this yesterday...

The gist is here:

It depends oncgat as well as pandas and numpy. You use it like so:

 from cgat import GTF
 coord_tobe_translated = pandas.read_csv("mycoords.tsv")
 coord_tobe_translated.set_index("ENST")
 for transcript in GTF.transcript_iterator(GTF.iterator(open("my_gtf.gtf"))):
    converter = TranscriptCoordInterconverter(transcript)
    this_transcript_coords = coord_to_be_converted.loc[transcript[0].transcript_id]
    genome_coords = converter.transcript2genome(this_transcript_coords.position)
    for pos in genome_coords:
         print transcript[0].transcript_id, pos

Its a big rusty, written years ago, in python 2.7, but you get the idea. One of these days I'll get round to packing it up as a proper utility. . Presented "as is". No guarentees implied. Caveat emptor.

ADD COMMENT
0
Entering edit mode

Thank you very much, it is very useful!

ADD REPLY

Login before adding your answer.

Traffic: 1915 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6