Question

Convert Gene Name to RefSeq ID

0

Entering edit mode

7.6 years ago

tlorin ▴ 370

Dear all,

I have seen this post that allows gene conversion from RefSeq IDs to gene names. What I would like is a tool (command line or web-based) that:

takes as input a list of gene names (shortcut name or comprehensive gene name) AND a query species
outputs the list of RefSeq IDs

In my case

the species would be Stegastes partitus

the gene names

LOC103370819
prtfdc1
LOC103367872
tfec
colony stimulating factor 1 receptor (csf1r)

I first thought of using blastdbcmd but it seems that blastdbcmd does not take gene names as input.

$blastdbcmd -db nt_Spar -dbtype nucl -entry prtfdc1 #nt_Spar a subset of nt with only S. partitus sequences
Error: prtfdc1: OID not found

I have tried using Batch Entrez but it does not accept gene names as input neither.

Many thanks for your help!

RNA-Seq ncbi sequence blast • 4.7k views

ADD COMMENT • link updated 7.6 years ago by Sej Modha 5.3k • written 7.6 years ago by tlorin ▴ 370

0

Entering edit mode

Have you tried https://biodbnet-abcc.ncifcrf.gov/db/db2db.php

ADD REPLY • link 7.6 years ago by Sej Modha 5.3k

0

Entering edit mode

I didn't know this tool! But I cannot make it work B-)

ADD REPLY • link 7.6 years ago by tlorin ▴ 370

score 2 · Accepted Answer · 2017-10-04

2

Entering edit mode

7.6 years ago

GenoMax 151k

You can get accession numbers for those genes by using NCBI eUtils. Here is an example: esearch -db nuccore -query "prtfdc1 [Gene] AND Stegastes partitus [ORGN]" | efetch -format docsum | xtract -pattern Caption -element Caption This produces XM_008294096 NW_007578669 You would want the NW* numbers. In that case add a pipe to grep NW* at the end of the command above.

ADD COMMENT • link 7.6 years ago by GenoMax 151k

0

Entering edit mode

Didn't work at the beginning: all the commands (esearch,efetch,xtract) need to be in the path (obviously). Works perfectly now, thanks!

ADD REPLY • link 7.6 years ago by tlorin ▴ 370

0

Entering edit mode

@genomax: how would you do with a complete gene name instead of the shortcut? For instance Stegastes partitus phosphoribosyl transferase domain containing 1.

This command does not output anything: ./esearch -db nuccore -query "Stegastes partitus phosphoribosyl transferase domain [Gene] AND Stegastes partitus [ORGN]" | ./efetch -format docsum | ./xtract -pattern Caption -element Caption

ADD REPLY • link 7.6 years ago by tlorin ▴ 370

1

Entering edit mode

esearch -db nuccore -query "Stegastes partitus phosphoribosyl transferase domain containing 1 AND Stegastes partitus [ORGN]" | efetch -format docsum | xtract -pattern Caption -element Caption | grep NW

NW_007577984
NW_007578669

While that seems to generate a result those accessions are for the genomic entries. Guess you may not be able to make some of them work.

ADD REPLY • link 7.6 years ago by GenoMax 151k