How to identify Gene IDs ?
3
2
Entering edit mode
10.6 years ago
jack ▴ 980

I have bunch of gene IDs and I want to figure out to which data base this IDs belong.(ENTREZ, UCSC, Ensemble,..). is there any way to do this?

My IDs are looks like:

A2M|2
A4GALT|53947
A4GNT|51146
AAA1|404744
AAAS|8086
AACSL|729522
R ID • 11k views
ADD COMMENT
0
Entering edit mode

These examples you posted seem to be all gene symbols|Entrez IDs (http://www.ncbi.nlm.nih.gov/gene/?term=2+53947+51146+404744+8086+729522[uid]+AND+Human[organism] )

ADD REPLY
7
Entering edit mode
10.6 years ago

The easiest way is to google a couple and then look for the ID in the results. The entries from genecards are often useful for this, since they draw from a number of different databases. In your case, those are Entrez IDs.

ADD COMMENT
1
Entering edit mode

To elaborate: the numerical IDs after the pipe symbol are Entrez IDs; the alphanumeric IDs before the pipe symbol are HGNC symbols.

ADD REPLY
4
Entering edit mode
10.6 years ago
Arnaud Ceol ▴ 860

The PSI-MI (HUPO Protein Interaction Standard Initiative - Molecular Interaction) consortium has developed an ontology to describe protein interaction. This ontology contains a branch for database references. Most database reference are associated with a regular expression (id-validation-regexp).

For instance:

  • nucleotide genbank identifier: id-validation-regexp: "[0-9]+"
  • entrez gene: id-validation-regexp: "[0-9]+|[A-Z]{1,2}_[0-9]+|[A-Z]{1,2}_[A-Z]{1,4}[0-9]+"
  • ensembl: id-validation-regexp: "ENS[A-Z]+[0-9]{11}|[A-Z]{3}[0-9]{3}[A-Za-z](-[A-Za-z])?|CG[0-9]+|[A-Z0-9]+\.[0-9]+|YM[A-Z][0-9]{3}[a-z][0-9]"

etc.

You'll find more details at the official website: http://www.psidev.info/groups/molecular-interactions. The EBI is also providing a convenient OBO browser: http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI

Unfortunately it will only be useful to distinguish IDs for databases which use different syntaxes. In other cases (e.g. genbank portein vs nucleotide), you will need to query the database.

ADD COMMENT
4
Entering edit mode
10.6 years ago
Katie D'Aco ★ 1.1k

DAVID has a tool to convert gene ID's to those of a particular data base. For the input Identifier you can select "not sure" and it will make its best guess. I've had some success using this tool, so it's worth a try if googling doesn't give you quick results.

edit: it looks like the DAVID database isn't updated, so it may not be too helpful for you.

ADD COMMENT
1
Entering edit mode

DAVID is dead. If you can afford it, QIAGEN'S Ingenuity Pathway Analysis (IPA) is a useful program. If not, maybe you'll find KEGG (Kyoto Encyclopedia of Genes and Genomes) to be of use.

ADD REPLY
0
Entering edit mode

DAVID is no longer dead. How long it will stay alive this time is another question.

ADD REPLY

Login before adding your answer.

Traffic: 1603 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6