Question

taxid to genome refseq accession number

0

Entering edit mode

6.4 years ago

ruchikabhat31 ▴ 60

Dear all, I have a list of taxids like: 10243 10244 10246 10247 10248 10249

And I am looking for their corresponding RefSeq Genome Accession Numbers. One example I manually searched was for taxid 10243 , genome refseq accession number is NC_003663.2 . Please guide. thanks.

genome taxid refseq accession number genome • 4.7k views

ADD COMMENT • link 6.4 years ago by ruchikabhat31 ▴ 60

1

Entering edit mode

6.4 years ago

GenoMax 147k

You can also try my answer here: A: How to retrieve any and all NCBI/GenBank accession numbers from a Taxonomy ID?

ADD COMMENT • link 6.4 years ago by GenoMax 147k

0

Entering edit mode

The solution given there seems to help in fetching GI numbers, which is not what I require. I need is whole genome Refseq Accession number for each taxid. Thanks anyways for help.

ADD REPLY • link 6.4 years ago by ruchikabhat31 ▴ 60

0

Entering edit mode

Did you miss that part?

Since you want accession numbers add step 4a: Under "Summary" on left side of the page choose "Format" --> "Accession list".

ADD REPLY • link 6.4 years ago by GenoMax 147k

0

Entering edit mode

Yes, sure. It's a manual way of doing, I am looking for a script /program as the id list exceeds lakhs. Once again thanks for your help.

ADD REPLY • link 6.4 years ago by ruchikabhat31 ▴ 60

1

Entering edit mode

6.4 years ago

ruchikabhat31 ▴ 60

Thank you all for your kind help and direction.

I have however utilized a different approach to gather information for acc. no.s , as my system couldn't install efetch and esearch (eutilities).

Also, manual way was inpossible for such a huge dataset.

My work is although a liitle exhaustive but had helped me so sharing with others for knowledge, just in case required:

wget url:

wget "https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=10234&lvl=3&lin=f&keep=1&srchmode=1&unlock"

Here I have replaced my taxid with $i which it read from list as,

for i in `cat list`; do wget "https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=**"$i"**&lvl=3&lin=f&keep=1&srchmode=1&unlock" ; done

then an index file forms like index_******_ taxid_*****

grep -E "Scientific name|/genome/?term=txid""$i" wwwtax.cgi\?mode\=Info\&id\=**"$i"**\&lvl\=3\&lin\=f\&keep\=1\&srchmode\=1\&unlock >Details_$i

will save in Detais_$s the details of taxids whose genome is available, such as taxid 10244 : https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=10244&lvl=3&lin=f&keep=1&srchmode=1&unlock

has and this id : 10234,

https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=10234&lvl=3&lin=f&keep=1&srchmode=1&unlock

doesn't.

So grep will keep all that saved in Details file, from details get their NC_**** acc numbers using the following url:

https://www.ncbi.nlm.nih.gov/genome/?term=txid10244[Organism:exp]

Hope this might help someone in future too, or someone may improve this to make it more organised.

Thanks once again biostars, especially Sej Modha and genomax for your help and kind guidance.

Thank you

ADD COMMENT • link updated 6.4 years ago by finswimmer 16k • written 6.4 years ago by ruchikabhat31 ▴ 60

0

Entering edit mode

Hello ruchikabhat31,

thank you for giving response and detailed description of your final solution.

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.

code_formatting

Thank you!

ADD REPLY • link 6.4 years ago by finswimmer 16k

0

Entering edit mode

Thank you finswimmer, for your help this time. I shall keep that in mind for the next time.

ADD REPLY • link 6.4 years ago by ruchikabhat31 ▴ 60

score 4 · Accepted Answer · 2018-06-25

4

Entering edit mode

6.4 years ago

Sej Modha 5.3k

The easiest way is to search against the nuccore database and limit the search against refseq using filter.

For example,

esearch -db nuccore -query "txid10242[Organism:exp] AND refseq[filter]"|efetch -format acc
NC_037656.1
NC_031033.1
NC_031038.1
NC_003663.2
NC_006998.1
NC_027213.1
NC_008291.1
NC_004105.1
NC_003391.1
NC_003310.1
NC_001611.1

esearch -db nuccore -query "txid10243[Organism:exp] AND refseq[filter]"|efetch -format acc
NC_003663.2

ADD COMMENT • link 6.4 years ago by Sej Modha 5.3k

0

Entering edit mode

My system is not supporting these utilities, as a result it shows command not found error. Can we get some curl/wget link to get NC_XXX data for each taxid. Would any other way round be possible?

ADD REPLY • link 6.4 years ago by ruchikabhat31 ▴ 60

0

Entering edit mode

You'd have to install these utilities on your computer and it can be downloaded from: ftp://ftp.ncbi.nlm.nih.gov/entrez/entrezdirect/

You might also find this eutils tutorial helpful.

ADD REPLY • link 6.4 years ago by Sej Modha 5.3k