annotation issue from Ensemble ID to gene name
0
0
Entering edit mode
6.9 years ago
Learner ▴ 280

Hello,

I asked a question and someone gave an answer which I liked (this is the question https://www.biostars.org/p/293965/#294344) . The problem I have been facing is that there are some genes (about 3000 that I cannot annotate) I am using the same method as described here or I tried to convert them based on Uniprot. I have been trying to find a solution which I could not. Is there anybody who knows how to convert them to gene names? I posted few of the ones that I cannot convert.

If there is no solution, then can you please explain why?

ENSG00000122718
ENSG00000130201
ENSG00000150076
ENSG00000150526
ENSG00000155640
ENSG00000166748
ENSG00000168260
ENSG00000168787
ENSG00000170590
ENSG00000170803
ENSG00000171484
ENSG00000172381
ENSG00000172774
RNA-Seq genome • 1.7k views
ADD COMMENT
0
Entering edit mode

Problem is these are retired gene identifiers. If you were to look these up HERE you can map them. see examples below.

ENSG00000166748 = AGBL1
ENSG00000170803 = OR2AG1

ADD REPLY
0
Entering edit mode

@genomax are you aware of any way to annotate them with programing? It is very hard to annotate 3000 genes one by one

ADD REPLY
0
Entering edit mode

Why are you using old annotations? Did you align your data against hg19/GRCh37?

ADD REPLY
0
Entering edit mode

@genomax they are data downloaded from TCGA , I did not align them , I just download the htseq-count

ADD REPLY
0
Entering edit mode

I am not sure what you ultimate aim is but you are going to be taking a leap of faith by assuming that results from data aligned to an old genome build are going to translate to current genome build. Any new work you may end up doing, you will likely need to use GRCh38 to be able to publish.

There are rest API end-points for Ensembl archives. You may want to create a help ticket with Ensembl support if you want to get help in using that API. There may also be past threads on Biostars related to this topic.

ADD REPLY
0
Entering edit mode

I had a similar issue last year. I spoke with Tomas at EBI and he directed me to the REST API also. Basically what happens is it gets the coords of the retired ENSG and then, using those coords, it grabs the new ENSG from the latest reference genome.

He highlighted one likely problem... some old IDs may over lap 2 new Ids - so which one to choose may be an issue.

ADD REPLY
0
Entering edit mode

@kennethcondon2007 can you please share with me the way you did it? I am really confused and i dont know what to do to get their gene name :-(

ADD REPLY
0
Entering edit mode

Unfortunately I never had a chance to implement his advice, but here are the steps I wrote down so I knew where to start when I got back to it:

ENSEMBL REST API

REST API: MAPPING --> convert coords of one assembly to another

REST API: OVERLAP --> Retrieves features (e.g. geneIDs) that overlap a given region (warning: u may get more than one object for a region but it should be rare)

Sorry I can't be more help.

ADD REPLY

Login before adding your answer.

Traffic: 1049 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6