Dear all,
I am working on read count files obtained from TCGA. Each read count file contains read counts for 60483 ensemblID.
I used Biomart to convert EnsemblIDs to gene symbols. However only for around 19000 ensemblIDs I found their equivalent gene symbols.
Strangely some EnsemblIDs are not recognized by Ensembl itself.
I have a list of differential expressed genes in which around 80% genes have only ensemblID with no name.
Since I want to use different enrichment or pathway databases, these IDs are problematic and are not detected by these databases.
Can you advise me how to tackle this problem?
Nazanin
Hello Nazanin,
could you show us some of the IDs which cannot be converted to a gene symbol?
fin swimmer
Hi,
Sure.
Here is some of my IDs:
ENSG00000011465.15 ENSG00000012223.11 ENSG00000016402.11 ENSG00000034971.13 ENSG00000050767.14 ENSG00000063127.14 ENSG00000064270.11 ENSG00000066382.15
Hello nazaninhoseinkhan,
you could choose two ways to get the gene names:
grep
/awk
to find the gene namesfin swimmer
nazaninhoseinkhan : You can go to current BioMart from main Ensembl page (no need to go to hg19 BioMart) and search the gene ID's without the version numbers.
Yes, one can have luck that this work. But it wouldn't suprise me if the current ensembl release skippes some genes from the former version or that the official gene symbol has changed.
So I think it is always a better idea to use the same reference assembly in each step. I suggested hg19 here because the version numbers from the examples above are from hg19.
fin swimmer
Ensembl was supposed to have redirects in place for stale Ensembl ID's. It was acknowledged as a problem in a discussion here (possibly before you joined). That fix may have been implemented already.
You could give Ensembl ID converter a try.
Hi,
Unfortunately when I used Ensembl ID converter, I got this message:" no stable IDs could be mapped"