Conversion of gene id to Refseq/gene symbol
3
2
Entering edit mode
4.0 years ago
sgupta ▴ 20

Hi,

I have a huge list of gene ids which are 0610005C13Rik, 0610007P14Rik etc in mouse RNA-Seq data. I want to convert it into either Refseq id or common gene symbol in R using Bioconductor package. I'm new to this and open to trying new ways.

RNA-Seq R next-gen • 6.0k views
ADD COMMENT
2
Entering edit mode
4.0 years ago
vkkodali_ncbi ★ 3.8k

You can use NCBI Datasets for this. Go to the NCBI Data Tables page and upload a file containing the list of names of the genes. Choose 'mouse' as the organism. The output table can be modified to include/exclude specific columns of interest to you and downloaded as a tab-delimited file.

Alternatively, you can use the command line tool to obtain the data in json format and parse it using a tool like jq to extract fields of interest.

ADD COMMENT
0
Entering edit mode

Looks like data tables is not accepting the ID's that user provided above.

ADD REPLY
0
Entering edit mode

But I just checked, they seem to be working fine. I used the "Enter identifiers manually" option, copy/pasted the identifiers (one per line), chose the identifier type as "gene symbol" and organism as "mouse". And I see a results table. Do you see any error messages?

ADD REPLY
0
Entering edit mode

I see it now. It is not intuitive that one can replace the default human with a different name/species. human has to be replaced manually with something else and return key pressed before possible options show up. Would be great if a down-arrow could be shown in the second box to indicate that additional options are available.

ADD REPLY
3
Entering edit mode
4.0 years ago
Gordon Smyth ★ 7.7k

This is the method I use. First download https://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Mammalia/Mus_musculus.gene_info.gz

Then

> library(limma)
> Aliases <- c("0610005C13Rik", "0610007P14Rik")
> GeneAnnotation <- alias2SymbolUsingNCBI(Aliases, "Mus_musculus.gene_info.gz")
> GeneAnnotation
      GeneID        Symbol                description
15939  71661 0610005C13Rik RIKEN cDNA 0610005C13 gene
9914   58520         Erg28 ergosterol biosynthesis 28

Note that the first Gene ID you give is already the common gene symbol. The second ID you give is an alias for Erg28.

The GeneID column in the above table is the NCBI Entrez Gene ID.

ADD COMMENT
0
Entering edit mode
4.0 years ago

biomartR or biomaRt may be your choice, or you can use biomart website (from ensembl) to download data and convert by yourself! Also there are other off/online covert tools, like DAVID.

ADD COMMENT

Login before adding your answer.

Traffic: 1628 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6