I have bunch of gene IDs and I want to figure out to which data base this IDs belong.(ENTREZ, UCSC, Ensemble,..). is there any way to do this?
My IDs are looks like:
A2M|2
A4GALT|53947
A4GNT|51146
AAA1|404744
AAAS|8086
AACSL|729522
I have bunch of gene IDs and I want to figure out to which data base this IDs belong.(ENTREZ, UCSC, Ensemble,..). is there any way to do this?
My IDs are looks like:
A2M|2
A4GALT|53947
A4GNT|51146
AAA1|404744
AAAS|8086
AACSL|729522
The easiest way is to google a couple and then look for the ID in the results. The entries from genecards are often useful for this, since they draw from a number of different databases. In your case, those are Entrez IDs.
The PSI-MI (HUPO Protein Interaction Standard Initiative - Molecular Interaction) consortium has developed an ontology to describe protein interaction. This ontology contains a branch for database references. Most database reference are associated with a regular expression (id-validation-regexp).
For instance:
"[0-9]+"
"[0-9]+|[A-Z]{1,2}_[0-9]+|[A-Z]{1,2}_[A-Z]{1,4}[0-9]+"
"ENS[A-Z]+[0-9]{11}|[A-Z]{3}[0-9]{3}[A-Za-z](-[A-Za-z])?|CG[0-9]+|[A-Z0-9]+\.[0-9]+|YM[A-Z][0-9]{3}[a-z][0-9]"
etc.
You'll find more details at the official website: http://www.psidev.info/groups/molecular-interactions. The EBI is also providing a convenient OBO browser: http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI
Unfortunately it will only be useful to distinguish IDs for databases which use different syntaxes. In other cases (e.g. genbank portein vs nucleotide), you will need to query the database.
DAVID has a tool to convert gene ID's to those of a particular data base. For the input Identifier you can select "not sure" and it will make its best guess. I've had some success using this tool, so it's worth a try if googling doesn't give you quick results.
edit: it looks like the DAVID database isn't updated, so it may not be too helpful for you.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
These examples you posted seem to be all gene symbols|Entrez IDs (http://www.ncbi.nlm.nih.gov/gene/?term=2+53947+51146+404744+8086+729522[uid]+AND+Human[organism] )