I'm trying to get a fasta from a list of IDs, but I have a lot of invalid IDs in the list
When I find the invalid IDs in the list, I get an error and my query is interrupted: "urllib.error.HTTPError: HTTP Error 400: Bad Request"
How to ignore the error and continue the query?
This example stops the query on the second ID:
from urllib.request import urlopen
from urllib.error import HTTPError
from Bio import Entrez
import time
Entrez.email = "xxxx@xxxxx.com"
IDs = ['AY851612', 'hahdshjhdasdhas', 'AY851612']
for i in IDs:
try:
handle = Entrez.efetch(db="nucleotide", id=i, retmode="xml")
except HTTPError:
time.sleep(20)
handle = Entrez.efetch(db="nucleotide", id=i, retmode="xml")
records = Entrez.read(handle)
#print(records)
print ("> " + i.rstrip()+" "+records[0]["GBSeq_definition"]+" "+records[0]["GBSeq_taxonomy"]+"\n"+records[0]["GBSeq_sequence"])
time.sleep(1) # to make sure not many requests go per second to ncbi
You can modify your script to try downloading the sequence record three times until all fail. If all three attempts fail, skip this record.
from urllib.request import urlopen
from Bio import Entrez
import time
Entrez.email = "xxxx@xxxxx.com"
IDs = ['AY851612', 'hahdshjhdasdhas', 'AY851612']
max_attemps = 3
for i in IDs:
handle = None
for n in range(max_attemps):
try:
handle = Entrez.efetch(db="nucleotide", id=i, retmode="xml")
break
except:
time.sleep(1)
if handle:
records = Entrez.read(handle)
print("> " + i.rstrip()+" "+records[0]["GBSeq_definition"]+" "+records[0]["GBSeq_taxonomy"]+"\n"+records[0]["GBSeq_sequence"])
time.sleep(1) # to make sure not many requests go per second to ncbi
else:
print('Could not download: {}'.format(i))
from urllib.request import urlopen
from urllib.error import HTTPError
from Bio import Entrez
import time
Entrez.email = "xxxx@xxxxx.com"
IDs = ['AY851612', 'hjshdaskdhsakjdhaskj', 'AY851612']
for i in IDs:
try:
handle = Entrez.efetch(db="nucleotide", id=i, retmode="xml")
except HTTPError:
try:
time.sleep(30)
handle = Entrez.efetch(db="nucleotide", id=i, retmode="xml")
except HTTPError:
print('Could not download: {}'.format(i))
continue
records = Entrez.read(handle)
print ("> " + i.rstrip()+" "+records[0]["GBSeq_definition"]+" "+records[0]["GBSeq_taxonomy"]+"\n"+records[0]["GBSeq_sequence"])
time.sleep(1) # to make sure not many requests go per second to ncbi
If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.
I also solved it in another way:
Thank you for sharing!
Thank you for sharing! Really helpful.
It works perfectly Thank you
If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.