Entering edit mode
6.8 years ago
Learner
▴
280
Hello,
I asked a question and someone gave an answer which I liked (this is the question https://www.biostars.org/p/293965/#294344) . The problem I have been facing is that there are some genes (about 3000 that I cannot annotate) I am using the same method as described here or I tried to convert them based on Uniprot. I have been trying to find a solution which I could not. Is there anybody who knows how to convert them to gene names? I posted few of the ones that I cannot convert.
If there is no solution, then can you please explain why?
ENSG00000122718
ENSG00000130201
ENSG00000150076
ENSG00000150526
ENSG00000155640
ENSG00000166748
ENSG00000168260
ENSG00000168787
ENSG00000170590
ENSG00000170803
ENSG00000171484
ENSG00000172381
ENSG00000172774
Problem is these are retired gene identifiers. If you were to look these up HERE you can map them. see examples below.
ENSG00000166748 = AGBL1
ENSG00000170803 = OR2AG1
@genomax are you aware of any way to annotate them with programing? It is very hard to annotate 3000 genes one by one
Why are you using old annotations? Did you align your data against hg19/GRCh37?
@genomax they are data downloaded from TCGA , I did not align them , I just download the htseq-count
I am not sure what you ultimate aim is but you are going to be taking a leap of faith by assuming that results from data aligned to an old genome build are going to translate to current genome build. Any new work you may end up doing, you will likely need to use GRCh38 to be able to publish.
There are rest API end-points for Ensembl archives. You may want to create a help ticket with Ensembl support if you want to get help in using that API. There may also be past threads on Biostars related to this topic.
I had a similar issue last year. I spoke with Tomas at EBI and he directed me to the REST API also. Basically what happens is it gets the coords of the retired ENSG and then, using those coords, it grabs the new ENSG from the latest reference genome.
He highlighted one likely problem... some old IDs may over lap 2 new Ids - so which one to choose may be an issue.
@kennethcondon2007 can you please share with me the way you did it? I am really confused and i dont know what to do to get their gene name :-(
Unfortunately I never had a chance to implement his advice, but here are the steps I wrote down so I knew where to start when I got back to it:
ENSEMBL REST API
REST API: MAPPING --> convert coords of one assembly to another
REST API: OVERLAP --> Retrieves features (e.g. geneIDs) that overlap a given region (warning: u may get more than one object for a region but it should be rare)
Sorry I can't be more help.