What Is The Best Source For Gene Function?
5
5
Entering edit mode
13.7 years ago
sa9 ▴ 870

Hi,

I am trying to get a gene summary to annotate variants in exome sequencing data. My genes Id are based on HGNC nomenclature. I am looking to get a gene summary similar to what is available in www.genecards.org (example : http://www.genecards.org/cgi-bin/carddisp.pl?gene=MAGI2#sum).

GeneCards database uses Entrez gene and other sources to feed their summary section. So, I decided to go for Entrez gene and found this file on their FTP (ftp://ftp.ncbi.nih.gov/gene/DATA/GENEINFO/Mammalia/Homosapiens.gene_info.gz). However, the file contains very short description for any given gene compared to what is available in GeneCards website.

I tried BioMart for Ensebml but I also had the same issue (using Ensembl definition, GO terms and various other attributes didn't yield much information).

Any suggestion for alternative resources?

function • 12k views
ADD COMMENT
10
Entering edit mode
13.7 years ago
Will 4.6k

Gene Ontology is probably the industry standard. There you can get structured and well annotated functions for just about any gene. These are perfect for programatically determining over-represented functions in a list of genes.

If you're looking for more of a "free-form" description I would suggest WikiGenes. Its not nearly as complete but it does have quite a bit of information for most genes.

ADD COMMENT
2
Entering edit mode
ADD REPLY
0
Entering edit mode

Thanks Will. I will use GO for now. Just out of curiosity, any ideas how to get the gene summary in RefSeq or Entrez Gene databases either via SQL , E-utilities or directly from FTP?

ADD REPLY
5
Entering edit mode
13.7 years ago

Refseq or the resources like GeneCards that depends on RefSeq is not always the best and up-to-date source for gene function. I would recommend a combination of resources like AmiGO, NCBI-Gene (see Related articles in PubMed, GeneRIFs, Phenotypes and interactions sections), GeneWiki, BioGPS, iHOP etc for a better understanding of the function.

Here is an example:

Take a look at the Refseq annotation for TRIM38 gene in NCBI / GeneCards See RefSeq Summary:

The protein encoded by this gene is a member of the tripartite motif (TRIM) family. The TRIM motif includes three zinc-binding domains, a RING, a B-box type 1 and a B-box type 2, and a coiled-coil region. The function of this protein has not been identified. [provided by RefSeq]

But there are several experimentally verified function ascribed to this gene in GOA.

Here also you should be aware that GO annotation is rapidly evolving and GO annotation may not expain the complete functional spectrum of a given gene. It is always good to check the Related articles in PubMed, GeneRIFs, Phenotypes and interactions sections in NCBI-Gene page for the functional aspects not captured by GO.

Until there is a community-wide agreement or standard on reporting biological function in manuscripts, the best bet will be consulting various resources to get a cohesive view of functions.

ADD COMMENT
0
Entering edit mode

Many thanks for the links Khader. This approach , unlike using GO, seems practically difficult to annotate genes in whole exome sequencing data. However, I can see how this approach can be very useful when there is a compelling candidate gene (or few genes) to investigate for more details.

ADD REPLY
3
Entering edit mode
13.7 years ago
Yannick Wurm ★ 2.5k

www.uniprot.org is high quality data.

ADD COMMENT
4
Entering edit mode

Actually UniProt is a combination of the high quality SwissProt and PIR data, and low quality trEMBL data. For purposes like this you will want to check the source.

ADD REPLY
3
Entering edit mode
13.7 years ago

You could try GoGene, which take a gene name as input and then categorizes (e.g. by biological process) and summarizes (e.g. number of abstracts per category term) the abstracts in PubMed associated with that gene name.

I have no idea what MAGI2 does, but GoGene says its likely to have guanylate kinase activity, a PDZ domain binding, involved in phosphorylation, and is found at synapses, the cell membrane, and intercellular Junctions.

ADD COMMENT
2
Entering edit mode

If you look at the bottom of a GoGene result page, there are a few links (to SIF GML GraphML & PubMedIDs). These link out to URLs that appear to call a RESTful style API, e.g. for Magi2 in SIF format: http://projects.biotec.tu-dresden.de/gogene/gogene/Search/SIF?q=magi2&type=SIFExportAll Therefore, you should be able to access GoGene data programmatically via wget.

ADD REPLY
0
Entering edit mode

Thanks Casey. However, I'm looking for resources that can be accessed programmatically to annotate thousand of genes rather than manually searching one by one. GoGene dosen't seem to have API or downloadable files.

ADD REPLY
1
Entering edit mode
13.1 years ago

Don't forget the genetics perspective. The earlier responses are all valid and useful to predict function or transfer function from a known or tested gene to one that is highly similar in primary sequence. Genetics - via knockout (KO) or knockdown (with interfering RNA), or over-expression - can also reveal phenotypes and other functional characteristics.

Knowing that gene YFG encodes an enzyme that converts A + B to C + H[?]2[?]O is one important aspect of function. Being able to say that YFG also has a role in muscle cell development as revealed by RNAi expts adds another dimension to function. Just as we can transfer GO annotation from a gene whose product was tested in a lab to another (highly) similar gene, we can do the same with phenotypes from genetics expts. So, one can mine mouse KO data to gain insight into human gene function.

ADD COMMENT

Login before adding your answer.

Traffic: 1074 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6