Gene Id Conversion Tool

62

Entering edit mode

16.0 years ago

Renee ▴ 620

Hey,

I was using DAVID (http://david.abcc.ncifcrf.gov/conversion.jsp) to do the gene ID conversion, e.g.conversion between Agilent ID, Genebank accession id and Entrez gene ID, but I found the DAVID database is not updated. Does anyone know a better updated conversion tool to do this job? Thanks!

david • 297k views

ADD COMMENT • link updated 2.0 years ago by Ram 45k • written 16.0 years ago by Renee ▴ 620

0

Entering edit mode

How frequently do you need things updated? DAVID does have yearly releases so far, but their latest release is this month (March 2010). See the release announcement here: http://david.abcc.ncifcrf.gov/forum/cgi-bin/ikonboard.cgi?act=ST;f=10;t=25 This does suggest the underlying mapping framework will be updated along with it in the 6.7 beta, and hence should include more recent information for the conversion tool

ADD REPLY • link updated 13.0 years ago by Istvan Albert 103k • written 15.7 years ago by User 59 13k

0

Entering edit mode

Hi

I am faced the same problem.I did differential gene expression by using this protocol "Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown"

I have gene list file after using the ballgown

the gene id in this files is as

id
MSTRG.28632
MSTRG.3615
MSTRG.7507
MSTRG.70532
MSTRG.49954
MSTRG.60656
MSTRG.34410

I want to perform gene ontology next by using tool AgriGo. these gene ids are not recognized in any database.

I have use the tool bioDBnet to convert these ids into ensembl gene id .but not found result.

ADD REPLY • link updated 6.2 years ago by Ram 45k • written 6.6 years ago by fatimarasool135 ▴ 90

0

Entering edit mode

I would say that the suffix numbers are Entrez IDs (Gene IDs).

ADD REPLY • link 5.1 years ago by rpg • 0

54

Entering edit mode

14.5 years ago

Casey Bergman 18k

The bioDBnet and Hyperlink Management System (HMS) systems convert multiple ID sets to each other.

HMS is limited to three species (human, mouse ciona) and has fewer data sources (Agilent - no, GenBank and Entrez - yes).

The bioDBnet system appears to be species-neutral and the network of linked databases is shown here, (includes Agilent, GenBank and Entrez, so it should fit your requirements): alt text

ADD COMMENT • link 14.5 years ago by Casey Bergman 18k

37

Entering edit mode

11.0 years ago

adam.maikai ▴ 530

MyGene.info is a web service that provides up to date annotations in several fields and is great for gene ID conversion. All species from NCBI and Ensembl are supported and annotations are updated weekly to ensure the latest annotations are available. Both python and R/Bioconductor clients are easy to use.

MyGene.info may not be able to solve your problem with Agilent IDs but several other IDs from Genebank, Uniprot, Ensembl, Refseq are all available. Also, from either client, you can query several thousand genes at once.

Here is some example syntax for ID conversion from the python module:

>>> import mygene
>>> mg = mygene.MyGeneInfo()
>>> mg.metadata['available_fields'] ## returns available query terms, pay special attention to "ensemblgene", "entrezgene", "symbol" and "uniprot"
[u'accession', u'alias', u'biocarta', u'chr', u'end', u'ensemblgene', u'ensemblprotein', u'ensembltranscript', u'entrezgene', u'exons', u'flybase', u'generif', u'go', u'hgnc', u'homologene', u'hprd', u'humancyc', u'interpro', u'ipi', u'kegg', u'mgi', u'mim', u'mirbase', u'mousecyc', u'name', u'netpath', u'pdb', u'pfam', u'pharmgkb', u'pid', u'pir', u'prosite', u'ratmap', u'reactome', u'reagent', u'refseq', u'reporter', u'retired', u'rgd', u'smpdb', u'start', u'strand', u'summary', u'symbol', u'tair', u'taxid', u'type_of_gene', u'unigene', u'uniprot', u'wikipathways', u'wormbase', u'xenbase', u'yeastcyc', u'zfin']

>>> xli = ['DDX26B','CCDC83', 'MAST3', 'RPL11', 'ZDHHC20', 'LUC7L3', 'SNORD49A', 'CTSH', 'ACOT8']
>>> mg.querymany(xli, scopes="symbol", fields=["uniprot", "ensembl.gene", "reporter"], species="human", as_dataframe=True)

A DataFrame is returned:

	Finished.
	_id ensembl.gene \
	query
	DDX26B 203522 ENSG00000165359
	CCDC83 220047 ENSG00000150676
	MAST3 23031 ENSG00000099308
	RPL11 6135 ENSG00000142676
	ZDHHC20 253832 ENSG00000180776
	LUC7L3 51747 ENSG00000108848
	SNORD49A 26800 [ENSG00000277370, ENSG00000175061]
	CTSH 1512 ENSG00000103811
	ACOT8 10005 ENSG00000101473

	reporter \
	query
	DDX26B {u'HG-U95B': u'53886_at', u'GNF1H': u'gnf1h144...
	CCDC83 {u'GNF1H': [u'gnf1h06565_at', u'gnf1h09743_at'...
	MAST3 {u'HG-U133_Plus_2': u'213045_at', u'HG-U95Av2'...
	RPL11 {u'GNF1H': u'200010_at', u'HG-U133_Plus_2': u'...
	ZDHHC20 {u'HG-U133_Plus_2': [u'225365_at', u'243786_at']}
	LUC7L3 {u'HG-U95B': [u'55032_at', u'57642_at'], u'HG-...
	SNORD49A {u'HG-U133_Plus_2': [u'225065_x_at', u'239754_...
	CTSH {u'HG-U133_Plus_2': u'202295_s_at', u'HG-U95Av...
	ACOT8 {u'HG-U95B': u'47789_at', u'HG-U133_Plus_2': [...

	uniprot
	query
	DDX26B {u'Swiss-Prot': u'Q5JSJ4'}
	CCDC83 {u'Swiss-Prot': u'Q8IWF9', u'TrEMBL': u'H0YDV3'}
	MAST3 {u'Swiss-Prot': u'O60307', u'TrEMBL': u'V9GYV0'}
	RPL11 {u'Swiss-Prot': u'P62913', u'TrEMBL': [u'Q5VVC...
	ZDHHC20 {u'Swiss-Prot': u'Q5W0Z9', u'TrEMBL': u'B4DRN8'}
	LUC7L3 {u'Swiss-Prot': u'O95232', u'TrEMBL': [u'A8K3C...
	SNORD49A NaN
	CTSH {u'Swiss-Prot': u'P09668', u'TrEMBL': [u'E9PKT...
	ACOT8 {u'Swiss-Prot': u'O14734', u'TrEMBL': [u'E9PIS...

view raw biostars-117163.txt hosted with ❤ by GitHub

And now for the Bioconductor package:

library(mygene)
xli <-  c('DDX26B','CCDC83',  'MAST3', 'RPL11', 'ZDHHC20',  'LUC7L3',  'SNORD49A',  'CTSH', 'ACOT8')
queryMany(xli, scopes="symbol", fields=c("uniprot", "ensembl.gene", "reporter"), species="human")

This returns a DataFrame:

Finished
DataFrame with 9 rows and 5 columns
                     ensembl.gene         _id uniprot.Swiss-Prot uniprot.TrEMBL       query
                  <CharacterList> <character>        <character>         <List> <character>
1                 ENSG00000165359      203522             Q5JSJ4       ########      DDX26B
2                 ENSG00000150676      220047             Q8IWF9       ########      CCDC83
3                 ENSG00000099308       23031             O60307       ########       MAST3
4                 ENSG00000142676        6135             P62913       ########       RPL11
5                 ENSG00000180776      253832             Q5W0Z9       ########     ZDHHC20
6                 ENSG00000108848       51747             O95232       ########      LUC7L3
7 ENSG00000277370,ENSG00000175061       26800                 NA       ########    SNORD49A
8                 ENSG00000103811        1512             P09668       ########        CTSH
9                 ENSG00000101473       10005             O14734       ########       ACOT8

ADD COMMENT • link updated 6.2 years ago by Ram 45k • written 11.0 years ago by adam.maikai ▴ 530

4

Entering edit mode

That's a pretty neat service. You should post this as a separate tool annonucement.

ADD REPLY • link updated 3.8 years ago by Ram 45k • written 11.0 years ago by Istvan Albert 103k

1

Entering edit mode

There is already a request for including Agilent reporter IDs in MyGene.info:

https://bitbucket.org/sulab/mygene.info/issue/1/support-for-agilent-platform-reporters

Please leave a comment there if someone need any specific platforms to be included.

ADD REPLY • link updated 3.8 years ago by Ram 45k • written 11.0 years ago by Newgene ▴ 370

0

Entering edit mode

Hi,

I am converting the gene IDs of apple, e.g., MDP0000006982, MDP0000007682, MDP0000799306,MDP0000799753, into entrez ID and ensemble gene ID using R bioconductor package "mygene", but function "queryMany" has not been working and giving error continuously.

Does anyone help me how to convert these IDs, please suggest any suitable tool or how to fix the above-mentioned issue?

Thanks

ADD REPLY • link updated 3.8 years ago by Ram 45k • written 5.9 years ago by rj.rezwan ▴ 20

1

Entering edit mode

thanks a lot for this tool. I couldn't find any other easy way to do this for sheep data!

ADD REPLY • link 3.9 years ago by sylvesterholt ▴ 10

20

Entering edit mode

15.7 years ago

Michael 56k

BioMart has already been mentioned. It can do much more than ID conversion but it is very useful for conversion purposes, it is regularly updated and you can select different genome builds and all kinds of genomic features. It seems to me that you wish to retrieve GeneIDs linked to Affymetrix IDs. To select these attributes in BioMart: go to the Martview page to start a new BioMart query.

Select attributes on the attribute page: The Ensembl GeneIDs and Transcript IDs are default. Ensembl GeneID and Affy IDs are under the "External" tab. Select your chip there. To limit to those genes which are on the chip, use the Filters->Gene menue. You can limit the genes to those present on various platforms or your favourite set.

There is an URL button in biomart that allows to retrieve a URL for your query and to pass it on to others. Try this example:

BioMart URL URL, that should be a good starting point.

If you are interested in KEGG identifiers (Pathways, Genes), EC-numbers, etc. the

KEGG Identifier page could be handy, because the KEGG ids are not in BioMart as far as I know.

ADD COMMENT • link 15.7 years ago by Michael 56k

0

Entering edit mode

thank you Michael, this is so helpful for me

ADD REPLY • link 9.2 years ago by debitboro ▴ 270

8

Entering edit mode

15.7 years ago

Perry ▴ 290

BridgeDB provides a nice API and REST interface, so you can put ID mapping queries in your scripts.

ADD COMMENT • link 15.7 years ago by Perry ▴ 290

0

Entering edit mode

BridgeDB is really a software framework that you can use in our own code; either directly (currently only in Java) or through calling it as a webservice. It can use different and even multiple stacked mappings. By default these come from ENSEMBL (for gene products) and HMDB (for metabolites). Ongoing projects extend the available mappings with ChemSPider and SNP info. There is a short introduction available at Nature Precedings: http://precedings.nature.com/documents/5023/version/1 and a paper in BMC Bioinformatics: http://dx.doi.org/10.1186/1471-2105-11-5

ADD REPLY • link updated 6.1 years ago by Ram 45k • written 14.7 years ago by Chris Evelo 10k

7

Entering edit mode

15.8 years ago

Giovanni M Dall'Olio 28k

You can also do it with the following services:

uniprot - Click on 'Id Mapping' from the home page.
biomart - choose a database and a version, then put the ids you want to convert under Filters->Id List limit (select the proper input id in the menu), and then the output ids under 'Attributes'. Biomart is a general tool that enables you to extract a lot of different informations from databases - sequences, ontologies, transcripts, homologues - but maybe for converting gene ids is a bit too complex.
galaxy - I can't help too much about this here but I am sure it has a function for doing that - and many other things.

ADD COMMENT • link 15.8 years ago by Giovanni M Dall'Olio 28k

7

Entering edit mode

15.7 years ago

Madelaine Gogol 5.3k

If you have just a few, I just saw someone use the R package BioIDMapper and it seemed kind of neat. But it's slow.

ADD COMMENT • link 15.7 years ago by Madelaine Gogol 5.3k

0

Entering edit mode

Unfortunately, this link is now broken :/ ...

ADD REPLY • link 11.6 years ago by Samuel Lampa ★ 1.3k

1

Entering edit mode

There is a more recent version at: http://sourceforge.net/projects/bioidmapper/

ADD REPLY • link 11.6 years ago by User 59 13k

5

Entering edit mode

15.9 years ago

Mohammed Islaih ▴ 50

The following link has a list of ID conversion tools:

http://hum-molgen.org/NewsGen/08-2009/000020.html

ADD COMMENT • link updated 13.0 years ago by Istvan Albert 103k • written 15.9 years ago by Mohammed Islaih ▴ 50

4

Entering edit mode

15.7 years ago

User 59 13k

http://idconverter.bioinfo.cnio.es/

Is another possible solution to this, although you might find this is not as up to date as you might like either.

ADD COMMENT • link 15.7 years ago by User 59 13k

1

Entering edit mode

This application does not work

ADD REPLY • link 9.2 years ago by Prakki Rama ★ 2.7k

0

Entering edit mode

Ah well - 6.4 years for an online bioinformatics application isn't the worst lifespan..

ADD REPLY • link 9.2 years ago by User 59 13k

0

Entering edit mode

Thanks. I knew about that. :) My intention was to make people skip the post without clicking link.

ADD REPLY • link 9.2 years ago by Prakki Rama ★ 2.7k

0

Entering edit mode

I would like to ask here that this tool also converts HGNC id to ENSEMBLE ID (ENSG..) But for all the HGNC ID I have I do not get the correspoding ENSEMBLE ID, is there anyway I can retrieve the maximum id of ENSEMBLE for my HGNC gene id's?

ADD REPLY • link 11.2 years ago by ivivek_ngs ★ 5.2k

3

Entering edit mode

10.0 years ago

grvpanchal ▴ 30

Try out: http://mygene.info/v2/api#MyGene.info-gene-query-service

It's Awesome!

ADD COMMENT • link updated 3.2 years ago by Ram 45k • written 10.0 years ago by grvpanchal ▴ 30

2

Entering edit mode

12.4 years ago

Samuel Lampa ★ 1.3k

Have a look at the (BETA stage) Ensembl REST API

For example, for converting from Ensembl Gene ids to Gene symbols, you could use a query like this one:

http://beta.rest.ensembl.org/xrefs/id/ENSG00000059804?content-type=application/json

... and then programmatically (some python parsing should be rather straight forward) extract the "display_id" for the items that have "dbname" = "HGNC", or "EntrezGene".

For example, the following PHP code did the trick for me:

# Test the new Ensemble REST API with an example gene
$ensemblID = "ENSG00000157764";
$url = "http://beta.rest.ensembl.org/xrefs/id/$ensemblID?content-type=application/json";
$ensemblResultJson = file_get_contents($url);
$ensemblResult = json_decode($ensemblResultJson, true);

# Print out each found Gene symbol on a separate row:
echo "<ul>";
foreach ($ensemblResult as $mapping) {
    if ( in_array( $mapping['dbname'], array("EntrezGene","HGNC"))) {
        echo "<li>Found Gene symbol: " . $mapping['display_id'] . "</li>\n";
    }
}
echo "</ul>";

ADD COMMENT • link 9.2 years ago by Samuel Lampa ★ 1.3k

2

Entering edit mode

8.9 years ago

Shicheng Guo ★ 9.6k

The most easy way is as the follow:

http://www.genenames.org/cgi-bin/download

ADD COMMENT • link 8.9 years ago by Shicheng Guo ★ 9.6k

2

Entering edit mode

8.0 years ago

Jerry Zhu ▴ 80

GeneID convert:

http://www.ensembl.org/biomart/martview

http://mygene.info/

https://biodbnet-abcc.ncifcrf.gov/

http://biodb.jp/

ADD COMMENT • link 8.0 years ago by Jerry Zhu ▴ 80

2

Entering edit mode

5.7 years ago

tamerg ▴ 100

biobtree is also a strong alternative tool for simple or advanced identifiers mapping and data retrieval for small or large datasets with R/Python packages.

ADD COMMENT • link 5.7 years ago by tamerg ▴ 100

1

Entering edit mode

16.0 years ago

Istvan Albert 103k

I don't know of a direct solution myself, but this is a topic that may be of interest for the biological data analysis class that I am teaching.

If you specify the organism/genomic builds that you are interested in we may be able to generate a full translation list as an in class example or a homework. I was planning on covering an Affymetrix ID to Genebank example anyhow.

ADD COMMENT • link 16.0 years ago by Istvan Albert 103k

0

Entering edit mode

Thanks! That's great! But I'm not student there...Can I access to that anyway? I am using Human whole genome Agilent array. Thank you so much.

ADD REPLY • link 16.0 years ago by Renee ▴ 620

0

Entering edit mode

missed this comment, sorry about it!

ADD REPLY • link 15.8 years ago by Istvan Albert 103k

1

Entering edit mode

11.9 years ago

aheinzel ▴ 130

Not sure what your background is, however, we recently started to develop an id mapper / converter for experimentalists who prefer organizing their data in Excel. Therefore, the client directly integrates into MS Excel.

Currently, we provide the possibility to map from various IDs to ensembl and back. The mapping data were extracted from Ensembl 73 (released on the 4.9.2013). If you need mappings for any additional ID types availalble from the ensembl database we will be happy to add them (please just tell us via our feedback form).

ADD COMMENT • link 11.9 years ago by aheinzel ▴ 130

0

Entering edit mode

@aheinzel

Is it possible to use this tool to generate ENSEMBLE (ENSG ID) from HGNC gene ID for human? Also does it work on Mac or is it just for Windows?

ADD REPLY • link updated 3.8 years ago by Ram 45k • written 11.2 years ago by ivivek_ngs ★ 5.2k

0

Entering edit mode

I don't really understand the need for this. Many identity mappers offer webservices and if needed these can be installed locally. That is definitely true for our own BridgeDb. Is there any reason you could not just call these services from Excel? (And yes that would allow mapping from ENSEML gene ID to HGNC or from probeset IDs)

ADD REPLY • link updated 3.8 years ago by Ram 45k • written 11.2 years ago by Chris Evelo 10k

1

Entering edit mode

5.8 years ago

Shicheng Guo ★ 9.6k

Share a R script to do it:

library("org.Hs.eg.db")
symbol <- as.list(org.Hs.egALIAS2EG)
symbol2geneid<-data.frame(names(symbol),as.character(symbol))

ADD COMMENT • link 5.8 years ago by Shicheng Guo ★ 9.6k

1

Entering edit mode

5.8 years ago

ravinsit06 ▴ 10

BioDBnet is a great tool for database do database id conversion. Link is https://biodbnet-abcc.ncifcrf.gov/db/db2db.php

You can convert the gene id to a different different database like Agilent ID, GO ID, Pathway ID. You can check the link to see all the databases available.

I have used it to convert the Gene ID to GO ID.

ADD COMMENT • link 5.8 years ago by ravinsit06 ▴ 10

Login before adding your answer.