Question

Maximum number of efetch

0

Entering edit mode

7.7 years ago

horsedog ▴ 60

Hi,

I'm retrieving protein sequences from UID using efetch. I got more than 30000 UID but after running python script, it only gave me exact 10000 sequences, I'm wondering is there any limitation of number of retrieving, like 10000? Is it possible to get more 30000 at one time? Thanks

Edirect • 4.7k views

ADD COMMENT • link updated 7.7 years ago by Pierre Lindenbaum 166k • written 7.7 years ago by horsedog ▴ 60

0

Entering edit mode

In addition to @Pierre's note below consider this. Time mentioned below is US East coast time.

In order not to overload the E-utility servers, NCBI recommends that users post no more than three URL requests per second and limit large jobs to either weekends or between 9:00 PM and 5:00 AM Eastern time during weekdays. Failure to comply with this policy may result in an IP address being blocked from accessing NCBI. If NCBI blocks an IP address, service will not be restored unless the developers of the software accessing the E-utilities register values of the tool and email parameters with NCBI.

ADD REPLY • link 7.7 years ago by GenoMax 152k

0

Entering edit mode

You can probably retrieve sequences much more efficiently using blastdbcmd from blast+ suite and a local copy of nr blast database.

ADD REPLY • link 7.7 years ago by GenoMax 152k

score 1 · Answer 1 · 2017-10-31

1

Entering edit mode

7.7 years ago

Pierre Lindenbaum 166k

https://www.ncbi.nlm.nih.gov/books/NBK25499/

Total number of UIDs from the retrieved set to be shown in the XML output (default=20).

Increasing retmax allows more of the retrieved UIDs to be included in the XML output, up to a maximum of 100,000 records.

To retrieve more than 100,000 UIDs, submit multiple esearch requests while incrementing the value of retstart

ADD COMMENT • link 7.7 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

Hi Pierre, is it possible to just get the fasta file instead of XML file? And I have no idea where to add retmax or retstart, this is my code: from Bio import Entrez Entrez.email = "A.N.Other@example.com" blast = open("file.txt").read() handle = Entrez.efetch(db="protein", id= blast, rettype="fasta") print(handle.read())

ADD REPLY • link 7.7 years ago by horsedog ▴ 60

0

Entering edit mode

Hi Pierre, do you know if looping through the records returned maintains an open ftp connection to NCBI? I have a firewall that doesn't allow ftps connections to remain open for long and my loop fails somewhere between 3 and 10 iterations. I suspect this is due to the ftp connection. I don't believe these iterations could hit the 3 requests per second maximum.

ADD REPLY • link 7.7 years ago by yarmda ▴ 40