What cause the differences between genes annotations from different databases?
1
1
Entering edit mode
7.2 years ago
darklings ▴ 580

I want to get the information of all genes on human Y chromosome, then I found the statistics in different databases --Ensembl (GENCODE), NCBI, HGNC -- are dissimilar.

For example, protein-coding genes numbers:

CCDS 63
HGNC 45
Ensembl 63
NCBI 73

So what leads to these number be different?

By the way, is RefSeq gene data the same as NCBI homo sapiens annotation release?

next-gen gene database • 2.1k views
ADD COMMENT
1
Entering edit mode

Ultimately HGNC is responsible for all human gene nomenclature. Other databases may add database specific annotation but if you need a list of approved genes for Y chromosome then HGNC is authoritative source.

ADD REPLY
0
Entering edit mode

So other databases will contain all official gene symbols and names from HGNC and add their specific annotations?

ADD REPLY
0
Entering edit mode

Yes but other resources sometimes lag behind HGNC. HGNC occasionally updates official symbols and/or names and the old ones become synonyms and the changes are not always picked immediately by others, it depends on their update cycle.

ADD REPLY
6
Entering edit mode
7.2 years ago

The different resources you cite do different things and are not necessarily in sync. The CCDS tries to identify annotations of protein-coding regions in the human and mouse genomes that are consensual across several groups/institutes. The HGNC is in charge of attributing official names and symbols to genes. NCBI's RefSeq is a collection of sequences that are annotated as belonging to a gene and/or linked to other NCBI resources. Ensembl provides a full genome annotation integrating many information types. Of the resources you cite, Ensembl is the only one that annotates the underlying genome. When doing a bioinformatics project, select one reference and stick to it. Don't mix and match, this would be asking for trouble. I would recommend using Ensembl because it's much better organized and integrated than NCBI resources.

ADD COMMENT

Login before adding your answer.

Traffic: 1471 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6