More Ensembl IDs than genes
1
0
Entering edit mode
4.3 years ago
patelk26 ▴ 320

Hello,

This is a very basic question, I am not sure if I understand Ensembl gene IDs. I have a gene quantification matrix and I can see there are around ~ 60K Ensembl gene IDs.

How can there be more (almost double) gene IDs than total number of genes in human genome? Can multiple gene IDs map to one gene? If yes, what is the purpose of having multiple gene IDs for one gene symbol?

genome next-gen gene • 1.8k views
ADD COMMENT
0
Entering edit mode

If those ID's contain ENST* then they are alternate transcripts that are generated from a gene. There can be multiple such transcripts.

ADD REPLY
0
Entering edit mode

I checked, they all begin from ENSG*

ADD REPLY
2
Entering edit mode
4.3 years ago
Ram 44k

This is probably all genes (from total RNA), not just protein coding genes. You'd see 58-59K of these, as opposed to around 27K protein coding genes. The GENCODE GTF file is an useful resource to subset to just protein coding genes.

Can multiple gene IDs map to one gene? If yes, what is the purpose of having multiple gene IDs for one gene symbol?

I think this also happens. EnsEMBL gives freshly annotated genes new ENSG IDs. They'd share HGNC symbol with an existing ENSG entry, but might be present on a patch or an alt contig.

ADD COMMENT
0
Entering edit mode

Thank you for the explanation, this makes sense.

ADD REPLY

Login before adding your answer.

Traffic: 1739 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6