I'm retrieving protein sequences from UID using efetch. I got more than 30000 UID but after running python script, it only gave me exact 10000 sequences, I'm wondering is there any limitation of number of retrieving, like 10000? Is it possible to get more 30000 at one time?
Thanks
In addition to @Pierre's note below consider this. Time mentioned below is US East coast time.
In order not to overload the E-utility servers, NCBI recommends that
users post no more than three URL requests per second and limit large
jobs to either weekends or between 9:00 PM and 5:00 AM Eastern time
during weekdays. Failure to comply with this policy may result in an
IP address being blocked from accessing NCBI. If NCBI blocks an IP
address, service will not be restored unless the developers of the
software accessing the E-utilities register values of the tool and
email parameters with NCBI.
Hi Pierre, is it possible to just get the fasta file instead of XML file? And I have no idea where to add retmax or retstart, this is my code:
from Bio import Entrez
Entrez.email = "A.N.Other@example.com"
blast = open("file.txt").read()
handle = Entrez.efetch(db="protein", id= blast, rettype="fasta")
print(handle.read())
Hi Pierre, do you know if looping through the records returned maintains an open ftp connection to NCBI? I have a firewall that doesn't allow ftps connections to remain open for long and my loop fails somewhere between 3 and 10 iterations. I suspect this is due to the ftp connection. I don't believe these iterations could hit the 3 requests per second maximum.
In addition to @Pierre's note below consider this. Time mentioned below is US East coast time.
You can probably retrieve sequences much more efficiently using
blastdbcmd
fromblast+
suite and a local copy ofnr
blast database.