Hello guys, I'm trying to download several genomes from ncbi using Entrez module in Biopython. I'm using obviously an API key, but it seems that after 20 records, download it's stopped. It's just me? I don't see explicit limit in Entrez documentation. I'm expecting around 773 records, as obtained with this key in NCBI assembly.
Escherichia[organism] AND complete+genome
If answer would be yes to the first question, how can I implement some nice workaround to timeout?
As requested, here's the query used ..
Entrez.email = "my@email"
Entrez.api_key = "mykey"
search_term = "Escherichia[organism] AND complete+genome[title]"
handle = Entrez.esearch(db="nucleotide", term=search_term)
genome_ids = Entrez.read(handle)['IdList']
I'm currently see just an IdList made by 20 records.
You can download them one at a time and make not to download a file twice by checking each download
Even with API keys there are limits per domain on how many connections can be made over a period of time. If someone else from your institution is connecting to NCBI this way their connections also count towards the total.Edit: If you are using a proxy server to connect to internet then total number of connections counted towards that IP may be more than what you are thinking they are. If you are sharing an API key with someone else all connections are counted for they key.
Build in some kind of delay after download of each record so you don't hit the connection limits.
Could you please post your entire query? The API key applies only to records fetched using eUtilities. If your ultimate goal is download genome sequence data, you should use FTP instead. If you can provide more information about what you are trying to download and the query you are using, I will be able to help you.
Added. Using a timeout with the
sleep
function could be a nice workaround, but I don't know how could be implemented in this case. I don't want to use FTP here because Entrez gives be directly thegbk
file without doing extraction from gzip archives.