Dear all,
I wrote a script to retrieve the corresponding nucleotide CDS sequences from a list of protein identifiers from NCBI, using Entrez.efetch in Python 3.7, Anaconda 3, and This script worked well a few weeks ago, but now for some reason it doesn't. Let me show you the code
ids=['XP_021798999.1', 'XP_003909393.1', 'XP_004781165.1']
Entrez.email= '<censored>'
handle = Entrez.efetch(db='nuccore', id=ids, rettype='fasta_cds_na', retmode='xml')
record = handle.read()
record=re.sub('\\n\\n', '\\n', record)
While this used to work, now it gives me the following error:
Entrez.email= '<censored>'
handle = Entrez.efetch(db='nuccore', id=ids, rettype='fasta_cds_na', retmode='xml')
Traceback (most recent call last):
File "<ipython-input-14-a939b978098e>", line 2, in <module>
handle = Entrez.efetch(db='nuccore', id=ids, rettype='fasta_cds_na', retmode='xml')
File "/home/guille/anaconda3/lib/python3.7/site-packages/Bio/Entrez/__init__.py", line 184, in efetch
return _open(cgi, variables, post=post)
File "/home/guille/anaconda3/lib/python3.7/site-packages/Bio/Entrez/__init__.py", line 545, in _open
raise exception
File "/home/guille/anaconda3/lib/python3.7/site-packages/Bio/Entrez/__init__.py", line 543, in _open
handle = _urlopen(cgi)
File "/home/guille/anaconda3/lib/python3.7/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/home/guille/anaconda3/lib/python3.7/urllib/request.py", line 531, in open
response = meth(req, response)
File "/home/guille/anaconda3/lib/python3.7/urllib/request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)
File "/home/guille/anaconda3/lib/python3.7/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/home/guille/anaconda3/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/home/guille/anaconda3/lib/python3.7/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
HTTPError: Bad Request
I tried with different combinations (e.g. using other db and id parameters, just to test if it's a general thing or not), and some of them worked, yet unfortunately none of them are useful for me. I updated biopython modules as well (to version 1.73) in case it was that, but same result.
I'd really appreciate your thoughts.
Best,
When doing programmatic queries against NCBI please build in a
sleep
interval. Have you also signed for NCBI API Keys? If you are not using those then your queries are further limited to 3 queries per second.Dear genomax. I already signed for an API key, and I run other scripts (in R, though) taking your point into consideration. However, in my example there are no for loops, and as far as I know it would count as a single request, right? If that is the case, there must be something else...
Queries seem to be working:
Hi Carambakaracho, I tried to do it via browser, following several examples in the docs, and they worked. However, using my example ids didn't work. I suspect that there must be something related to using 'nuccore' in combination with XP/NP ids...It shouldn't be that, because it worked just right short ago , but I'm starting to think that perhaps they changed something at NCBI's side :/