Question

Error in fetching the Refseq using Taxonomic ID

0

Entering edit mode

7.3 years ago

Paul ▴ 80

I have been trying to extract the reference sequences for the list of taxonomic IDs I have like:

So, for 1438843, the reference sequence is NC_000962.3 and I need to download this particular reference sequence with respect to taxonomic ID 1438843.

When I try to fetch the RefSeq for the first taxonomic ID using the following Eutility command:

esearch -db genome -query "txid1438843 [Organism]" | elink -target nuccore | efilter -query "refseq"| efetch -format fasta

It shows an error like:

ERROR in filt input: callMLink: Query failed on MegaLink server

ERROR in fetch input: callMLink: Query failed on MegaLink server

Could any suggest me a way to fetch the refseq for the above mentioned taxonomic IDs?

eutility NCBI Reference sequence • 3.9k views

ADD COMMENT • link updated 3.1 years ago by Nelo ▴ 20 • written 7.3 years ago by Paul ▴ 80

0

Entering edit mode

Hello Good evening

How to extract protein Id from the given gene ID

gene ID: AB845604 AB845605 AB845606 AB845607 AB845608 AB845609 AB845610

Tnks in advance

ADD REPLY • link 3.1 years ago by Nelo ▴ 20

0

Entering edit mode

It is not a good practice to ask unrelated questions in pre-existing threads.

You can do the following with EntrezDirect:

$ esearch -db gene -query "AB845604" | elink -target protein | efetch -format acc
NP_001289835.2
BAO18621.1

Use a for loop to go through your list.

ADD REPLY • link 3.1 years ago by GenoMax 147k

0

Entering edit mode

opps My mistake not very much familiar with this, but will keep in mind next time Actually i got the same isses like this:

" ERROR in filt input: callMLink: Query failed on MegaLink server "

while i was trying out some commands to convert to the protein ID.

ADD REPLY • link 3.1 years ago by Nelo ▴ 20

score 3 · Accepted Answer · 2017-08-03

3

Entering edit mode

7.3 years ago

Sej Modha 5.3k

I was able to download the sequences by removing the space between txid1438843 and [Organism]

esearch -db genome -query "txid1438843[Organism]" | elink -target nuccore | efilter -query "refseq"|efetch -format fasta

ADD COMMENT • link 7.3 years ago by Sej Modha 5.3k

0

Entering edit mode

but there is only one reference sequence (NC_000962.3) with respect to taxonomic ID "txid1438843[Organism]"

https://www.ncbi.nlm.nih.gov/genome/?term=txid1438843+%5BOrganism%5D

ADD REPLY • link 7.3 years ago by Paul ▴ 80

0

Entering edit mode

Thanks it's not showing error anymore. But results in multiple sequences, whereas I need only one RefSeq sequence (NC_000962.3)

ADD REPLY • link 7.3 years ago by Paul ▴ 80

2

Entering edit mode

I am going to hazard a guess that since you are using a taxID (for Mycobacterium tuberculosis) every M. tuberculosis genome that is there in RefSeq database (currently 5248) is going to be pulled up. You probably need an additional filter on your query.

ADD REPLY • link 7.3 years ago by GenoMax 147k

2

Entering edit mode

That's right, you will need another filter that fetches the representative assembly. Following command returns fasta sequence for NC_000962.3.

esearch -db genome -query "txid1438843[Organism]"|elink -target assembly|efilter -query "representative[PROP]"|elink -target nuccore -name assembly_nuccore_refseq|efetch -format fasta

ADD REPLY • link 7.3 years ago by Sej Modha 5.3k

1

Entering edit mode

Nice! Looks like you know your entrez utilities by heart.

Slightly unrelated question. Is there a chart representation of what can/should be logically connected with what for various entrez utilities? I find the in-line help severely lacking except for providing bare syntax.

ADD REPLY • link 7.3 years ago by GenoMax 147k

0

Entering edit mode

I don't think a document like that exists. @Joseph Hughes had asked a similar question: NCBI database schema. I tend to use the NCBI web pages to tackle such complicated queries in GUI first and then try to recreate those links using eutils and bit of help from https://www.ncbi.nlm.nih.gov/books/NBK179288/.