question on genes with ensembl gene ID, but without associated gene name and corresponding Entrez ID
3
0
Entering edit mode
10.1 years ago
Yizhao ▴ 10

Hi,

Now, I encounter a confusion about gene info mapping between ensembl ID(database accession in Ensembl) and ENTREZ ID(gene accession in NCBI GENE). I've found that not all genes with ensembl ID possess corresponding ENTREZ IDs, but still annotate to specific GO terms. Could anybody help me make it clear? I want to know why, why did the researchers tolerate this kind of gene annotation?

Any answer or hint will be appreciated. Thx in advance.

gene-feature ensembl • 8.0k views
ADD COMMENT
0
Entering edit mode

Hi, all,

thx for all your enthusiastic reply.

Sorry for the late follow-up because of my trip.

Anyway, Thanks .

ADD REPLY
2
Entering edit mode
10.1 years ago

The various organizations use different criterion and algorithms to call genes, so there's no reason to expect any two of them to agree on every gene.

ADD COMMENT
0
Entering edit mode

Due to your reply, there's no exactly answer to leo.ng's question? Sometime I also have the same problem like this. I've found that not all genes with ensembl ID possess corresponding ENTREZ IDs, but still annotate to specific GO terms. Can someone help me to make it clear?

ADD REPLY
0
Entering edit mode

Sure, there are a number of ways of associating a given gene with a GO term. Among these are finding a similar gene with known function, in which case you can borrow it's GO terms (other possibilities include using gene covariation modules to functionally group genes, though I don't know how often this is used in practice).

ADD REPLY
1
Entering edit mode
10.1 years ago

As has already pointed out, the different gene sets from the different resources do not fully overlap.

To overcome this, we (Ensembl) try to map our gene models to as many external sources as possible.

This way, a gene might not have a corresponding match in RefSeq (EntrezGene) but it can map a Uniprot entry. This allows us to assign GO terms from different sources.

Hope that helps,

Magali

ADD COMMENT
0
Entering edit mode

OK, but it would help if the HGNC, UniProt, Entrez Gene and Ensembl teams got together to sort out which "genes" where and why they don't overlap, at least for human proteins.

ADD REPLY
0
Entering edit mode

In a way, isn't this the point of gencode? It's a collaboration between Ensembl, UCSC, etc. etc. to annotate genomes. That's about the most definitive source you'll get. I'll also add the for mouse and human, the new addition of TSLs to Ensembl is quite nice in this regard.

ADD REPLY
0
Entering edit mode
10.1 years ago
cdsouthan ★ 1.9k

This question comes up all the time. While Ryan is right, what you can do practically is generate consensus sets. However the numbers depend on which portal, what starting points and the sequence in which you make the intersects.

This simple UniProt query

http://www.uniprot.org/uniprot/?query=database%3A%28type%3Aensembl%29+AND+reviewed%3Ayes+AND+organism%3A%22Homo+sapiens+%28Human%29+[9606]%22+AND+database%3A%28type%3Ageneid%29&sort=score

tells you that 18,324 protein "genes" (in the cannonical SwissProt sense) agree with both EGIDs and Ensembl IDs

Coming from the EGID side

http://www.ncbi.nlm.nih.gov/gene/?term=%22Homo+sapiens%22[porgn]+AND+%22matches+ensembl%22[Properties]

gives 21569 but these are not all proteins

The Ensembl coding side gives 20,364 (incl. 509 readthrough)

Note if the Ensembl/Havanna gene build includes an ORF that EGID does not (I think) it may sometimes get a GO term via IntePro-to-GO

ADD COMMENT

Login before adding your answer.

Traffic: 2116 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6