Question

Gene annotations of mammalian genomes

0

Entering edit mode

3.7 years ago

arsala521 ▴ 50

Hello everyone,

I want to ask about gene annotations in mammalian genomes. There is genome sequence along with gene annotations available for many mammalian genomes. I want to know if these gene annotations in non-human mammals are based on any transcriptomics data (if these gene annotations have any mRNA or EST support) or they are homology based predictions using human genome as reference.

Thanks in advance

Gene annotation mammalian genomes • 990 views

ADD COMMENT • link 3.7 years ago by arsala521 ▴ 50

1

Entering edit mode

Are you wondering whether gene expression data exists for species other than human? Then the answer is yes, there is plenty of gene expression and protein data for other species, and yes, it will also be taken into account for the annotations. You can read more about Ensembl's annotation process, e.g. here.

ADD REPLY • link 3.7 years ago by Friederike 9.0k

0

Entering edit mode

Thank you for sharing a useful paper. It did provide me some clues but I am not getting an answer to my question. Let me put it again. From NCBI assembly database, I found that there is genome assembly (with full genome representation) available for 177 different mammalian species, and 84 out of these 177 genome assemblies have RefSeq and/or Genbank annotations. I am trying to find out if these gene annotations are based on any experimental/transcriptomics data or they are only computational predictions. If someone can help me with that, I would be very thankful.

ADD REPLY • link 3.7 years ago by arsala521 ▴ 50

1

Entering edit mode

For most species, it will be a mix. Even for humans, many genes will be part of the annotation that may have mostly been described and studied in other species. If you check the details of the vega/Ensembl documentation, you can find the following information:

Gene Classification Genes can be classified according to their status, which indicates the type of evidence that supports the annotation, and their biotype, an indicator of biological significance. For simplicity of display, genes are coloured according to their biotype only, so for example 'Known Protein coding' and 'Novel Protein coding' genes are both shown in the same shade of blue.

The following types of status are used:

Known. Identical to known cDNAs or proteins from the same species and has an entry in species specific model databases: EntrezGene for human dog and pig, MGI for mouse, RGD for rat, and Zfin for Zebrafish.
Novel. Identical or homologous to cDNAs from the same species, or proteins from all species.
Putative. Identical or homologous to spliced ESTs from the same species.
Predicted. Based on ab initio prediction and for which at least one exon is supported by biological data (unspliced ESTs, protein sequence similarity with mouse or tetraodon genomes or expression data from Rosetta).

Genes may have no status shown where this is not applicable, as for example with the majority of pseudogenes.