How Are Gene Synonyms Defined
5
9
Entering edit mode
13.9 years ago

Hi, I'm a computer scientist working on PPI networks. I'm trying to understand the concept of synonym. How are synonyms defined and how can one conclude that 2 gene names are synonyms?

I thought they would be the same or almost same, having very similar sequence and interactions but I found counter examples. For example I'm wondering how this situation does exist:

I found a gene in Homo Sapiens called MEN1 (official symbol). When I look at its synonyms, I see SCG2 which is the official symbol of a gene also in Homo Sapiens.

There are 2 genes. They are both in Homo Sapiens and they are synonyms. Their interactions are completely different.

I could not find much information about this issue. I'd like to know the definition of synonym and if I can assume two genes that are synonyms are the same for some specific cases.

gene protein ppi • 4.8k views
ADD COMMENT
8
Entering edit mode
13.9 years ago
Mary 11k

Synonyms are a curation issue. Alternative names are generated from a variety of situations: some are different labs stumbling on to the same gene, some are simple differences between characters (MEN1, MEN 1, MEN-1). Sometimes large-scale projects generated some form of coded designation (such as 1200015E08Rik, a mouse example).

Alternate names are collected by curation teams at various places. Some come from papers read by curators. Some would have come from computational sources. Most of them are shared around, but there may be some sources that have some name that others don't.

There are nomenclature standards that are supposed to be used, and there are committees that establish the rules and assign official names and symbols. However, some people refuse to use those names in publications and revert to their preferred name.

The official name/symbols shouldn't have duplicates. But the synonyms might--so what you are seeing with SCG2 could certainly be 2 different genes.

EDIT: If you are new to this arena maybe you should see some nomenclature stuff. Here's the mouse stuff, maintained by Jax, where I learned about it: http://www.informatics.jax.org/mgihome/nomen/index.shtml

Here's the HUGO site for human http://www.genenames.org/

ADD COMMENT
1
Entering edit mode
13.9 years ago
Will 4.6k

So if you're dealing with gene-names I would suggest only dealing with the Official Symbol or Official Name. Even better would be to only use an unambigious ID ... personally I prefer RefSeq IDs.

if I can assume two genes that are synonyms are the same for some specific cases

In general I would say no. I would suggest indexing by a numeric ID instead of by name/synonym.

I'm not sure of official definition of synonyms but they arose from the fact that researchers were working on genes before the various genomes were complete. Researchers were studying a gene in a particular context (cell cycle, cell signalling, etc) and gave the gene a name consistent with its function in that context. As the genome was completed, or at least during the growth of sequence repositories, different researchers started realizing that they were studying the same gene and now had to resolve all of those names into a unified system.

ADD COMMENT
1
Entering edit mode
13.9 years ago

Yes, I would recommend the HUGO site, as Mary writes. There are similar group, to the best of my knowledge for fly, worm, yeast, and perhaps some others (Arabidopsis?). Mammalian genomes tend to use the human name - but not always.

There are two aspects to synonyms: gene names and gene symbols. Human MAT1A is also known by MAT; SAMS; MATA1; SAMS1; MAT1A. That gene name is methionine adenosyltransferase I, alpha - but could also be S-adenosylmethionine synthase isoform type-1; S-adenosylmethionine synthetase isoform type-1; adoMet synthase 1; adoMet synthetase 1; methionine adenosyltransferase 1; or methionine adenosyltransferase I/III

By the way, one of the worst gene symbols has got to be T. Others are annoying or humorous because they spell words or have alternate meaning under another usage (check the alias).

ADD COMMENT
0
Entering edit mode

A particularly problematic symbol in mouse was a (http://www.informatics.jax.org/javawi2/servlet/WIFetch?page=markerDetail&key=3).

Grrr.... And there's that whole controversy of excel sheets changing items like Oct8 to dates. There's even a paper on that.

ADD REPLY
0
Entering edit mode

Yes, I almost added the MARCH, SEPT, OCT and DEC genes to the list above, but that is not really a problem in my mind with gene symbols, but a problem with Excel.

ADD REPLY
1
Entering edit mode
13.9 years ago
hurfdurf ▴ 490

The canonical gene identifiers (as defined by HUGO, Refseq, Ensembl, etc.) have to be unique. Synonym lists are used to aid in looking at legacy publications which use old imprecise names. In the case of SCG2, it's probable that MEN1 was somehow conflated with SCG2 in the literature, so some of the SCG2 papers are actually referring to MEN1.

Aliases, as a list of all the names that were ever applied to a gene, can't be unique and non-overlapping. There are plenty of gene families that were once known under a single name, thus all the genes will have the same (incorrect) alias.

ADD COMMENT
1
Entering edit mode
13.9 years ago

Most tools to predict protein-protein interaction networks, like String or the others, make use of automated literature parsing and are easily confused by synonyms of genes with similar names. One case I found is the ALG2 gene which String confuses with ALG-2, a gene with an almost identical name.

There is no trivial solution to your problem. You should always use the official HGNC name (Hugo Gene Names Committee). In the cases that you described, where you find different annotations for different synonyms of the same gene, you can try to merge them all together and then proceed with a manual curation of the interactions network. That's the only way to avoid lot of false positives.

ADD COMMENT

Login before adding your answer.

Traffic: 1759 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6