How to pass multiple seq_start and seq_stop statements to NCBI efetch
1
0
Entering edit mode
8.2 years ago

Hi,
I wasn't able to find anywhere how to pass several seq_start and seq_stop optional arguments to list of queries for NCBI efetch.
See this:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?
db=nuccore&id=433294648rettype=fasta&seq_start=100&seq_stop=200

server ansver: >gb|CP003078.1|:100-200 Mycobacterium sp. JS623, complete genome
GGGTCGCAGCCGTATCGCCACGTTCGGGCGACTGTTCGAGGGTACTGACGACATTTCGCTGGGTCAAACC
TCGCCCGAGCGATCCCGGGTCACCGCCCGCA

And now multiple queries:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?
db=nuccore&id=433294648,755160968&rettype=fasta

Server ansver: 2 fasta whole records in one file in a blink of an eye.

Question: Does anybody know, if it is possible, and if so, than how to combine those to obtain 1 short fasta record per UID posted, determined by seq_start & seq_stop arguments? So the server answer to something like:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?
db=nuccore&id=433294648,755160968&rettype=fasta&seq_start=100,200&seq_stop=200,500

would be:
>gb|xxxxxxxx.x|:100-200 orgn x
GGGTCGCAGCCGTATCGCCACGTTCGGGCGACTGTTCGAGGGTACTGACGACATTTCGCTGGGTCAAACC
TCGCCCGAGCGATCCCGGGTCACCGCCCGCA
>gb|yyyyyyyy.y|:200-500 orgn y
GGGTCGCAGCCGTATCGCCACGTTCGGGCGACTGTTCGAGGGTACTGACGACATTTCGCTGGGTCAAACC
TCGCCCGAGCGATCCCGGGTCACCGCCCGCA

What I'have tried so far is comma-separated list of seq_start&stop, putting it into [], add +AND+, add semicolon, anything I could thing of.
I know how to solve this in for-loop but it would help me a lot, if I could do this in 'batch' mode.

Any suggestion would be appreciated. Thanks a lot.

Ps.: I have already asked this here: C: Fetching Genbank Entries For List Of Accession Numbers., but it feels little of topic and question was not elaborated.

sequence NCBI efetch ENTREZ • 2.2k views
ADD COMMENT
0
Entering edit mode

You can use the Unix e-utils and write a bash script to parse the file to take seq_start and seq_stop values for each line. Sample command would be

efetch -id 433294648 -format fasta -db nucleotide -seq_start 100 -seq_stop 200

PS: NCBI is phasing out GI numbers so it is recommended to use accession numbers instead.

ADD REPLY
0
Entering edit mode

Hi, Than you for reply. I know that I can do that in a for loop, (and currently doing so); But since I want to fetch relatively short fragments, I want to fetch them all witch one command (reasonable number) to limit the calls to NCBI server. Or am I missing something and this is what the UNIX e-utils would inherently do by itself?

To Ps.: Yes, I know of that, currently it is working with accession, but it is undocumented according to: (http://www.ncbi.nlm.nih.gov/books/NBK25499/)

ADD REPLY
0
Entering edit mode
8.2 years ago

Hi,
so it might be help to anyone in future: after consulting with NCBI Entrez support, it appears that this functionality isn't and will not be supported.
It's a shame. So don't waste your time and for-loop forever.

Good guys wrote:
No that will not be possible. The starts and stops must be singly for the id requested.

ADD COMMENT

Login before adding your answer.

Traffic: 1972 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6