ID query with entrez -- invalid ID - rllib.error.HTTPError: HTTP Error 400: Bad Request
1
0
Entering edit mode
4.5 years ago

Hi,

I'm trying to get a fasta from a list of IDs, but I have a lot of invalid IDs in the list

When I find the invalid IDs in the list, I get an error and my query is interrupted: "urllib.error.HTTPError: HTTP Error 400: Bad Request"

How to ignore the error and continue the query?

This example stops the query on the second ID:

from urllib.request import urlopen                                          
from urllib.error import HTTPError 
from Bio import Entrez
import time

Entrez.email = "xxxx@xxxxx.com"
IDs = ['AY851612', 'hahdshjhdasdhas', 'AY851612']
for i in IDs:
    try:
        handle = Entrez.efetch(db="nucleotide", id=i, retmode="xml")
    except HTTPError:
        time.sleep(20)
        handle = Entrez.efetch(db="nucleotide", id=i, retmode="xml")
    records = Entrez.read(handle)
    #print(records)
    print ("> " + i.rstrip()+" "+records[0]["GBSeq_definition"]+" "+records[0]["GBSeq_taxonomy"]+"\n"+records[0]["GBSeq_sequence"])
    time.sleep(1) # to make sure not many requests go per second to ncbi
gene sequence • 2.3k views
ADD COMMENT
6
Entering edit mode
4.5 years ago

You can modify your script to try downloading the sequence record three times until all fail. If all three attempts fail, skip this record.

from urllib.request import urlopen
from Bio import Entrez
import time

Entrez.email = "xxxx@xxxxx.com"
IDs = ['AY851612', 'hahdshjhdasdhas', 'AY851612']
max_attemps = 3

for i in IDs:
    handle = None
    for n in range(max_attemps):
        try:
            handle = Entrez.efetch(db="nucleotide", id=i, retmode="xml")
            break
        except:
            time.sleep(1)
    if handle:
        records = Entrez.read(handle)
        print("> " + i.rstrip()+" "+records[0]["GBSeq_definition"]+" "+records[0]["GBSeq_taxonomy"]+"\n"+records[0]["GBSeq_sequence"])
        time.sleep(1) # to make sure not many requests go per second to ncbi
    else:
        print('Could not download: {}'.format(i))

Output:

> AY851612 Opuntia subulata rpl16 gene, intron; chloroplast Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliopsida; eudicotyledons; Gunneridae; Pentapetalae; Caryophyllales; Cactineae; Cactaceae; Opuntioideae; Austrocylindropuntia
cattaaagaagggggatgcggataaatggaaaggcgaaagaaagaaaaaaatgaatctaaatgatatacgattccactatgtaaggtctttgaatcatatcataaaagacaatgtaataaagcatgaatacagattcacacataattatctgatatgaatctattcatagaaaaaagaaaaaagtaagagcctccggccaataaagactaagagggttggctcaagaacaaagttcattaagagctccattgtagaattcagacctaatcattaatcaagaagcgatgggaacgatgtaatccatgaatacagaagattcaattgaaaaagatcctaatgatcattgggaaggatggcggaacgaaccagagaccaattcatctattctgaaaagtgataaactaatcctataaaactaaaatagatattgaaagagtaaatattcgcccgcgaaaattccttttttattaaattgctcatattttattttagcaatgcaatctaataaaatatatctatacaaaaaaatatagacaaactatatatatataatatatttcaaatttccttatatacccaaatataaaaatatctaataaattagatgaatatcaaagaatctattgatttagtgtattattaaatgtatatcttaattcaatattattattctattcatttttattcattttcaaatttataatatattaatctatatattaatttataattctattctaattcgaattcaatttttaaatattcatattcaattaaaattgaaattttttcattcgcgaggagccggatgagaagaaactctcatgtccggttctgtagtagagatggaattaagaaaaaaccatcaactataaccccaagagaaccaga
Could not download: hahdshjhdasdhas
> AY851612 Opuntia subulata rpl16 gene, intron; chloroplast Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliopsida; eudicotyledons; Gunneridae; Pentapetalae; Caryophyllales; Cactineae; Cactaceae; Opuntioideae; Austrocylindropuntia
cattaaagaagggggatgcggataaatggaaaggcgaaagaaagaaaaaaatgaatctaaatgatatacgattccactatgtaaggtctttgaatcatatcataaaagacaatgtaataaagcatgaatacagattcacacataattatctgatatgaatctattcatagaaaaaagaaaaaagtaagagcctccggccaataaagactaagagggttggctcaagaacaaagttcattaagagctccattgtagaattcagacctaatcattaatcaagaagcgatgggaacgatgtaatccatgaatacagaagattcaattgaaaaagatcctaatgatcattgggaaggatggcggaacgaaccagagaccaattcatctattctgaaaagtgataaactaatcctataaaactaaaatagatattgaaagagtaaatattcgcccgcgaaaattccttttttattaaattgctcatattttattttagcaatgcaatctaataaaatatatctatacaaaaaaatatagacaaactatatatatataatatatttcaaatttccttatatacccaaatataaaaatatctaataaattagatgaatatcaaagaatctattgatttagtgtattattaaatgtatatcttaattcaatattattattctattcatttttattcattttcaaatttataatatattaatctatatattaatttataattctattctaattcgaattcaatttttaaatattcatattcaattaaaattgaaattttttcattcgcgaggagccggatgagaagaaactctcatgtccggttctgtagtagagatggaattaagaaaaaaccatcaactataaccccaagagaaccaga
ADD COMMENT
1
Entering edit mode

I also solved it in another way:

from urllib.request import urlopen                                          
from urllib.error import HTTPError 
from Bio import Entrez
import time

Entrez.email = "xxxx@xxxxx.com"
IDs = ['AY851612', 'hjshdaskdhsakjdhaskj', 'AY851612']
for i in IDs:
    try:
        handle = Entrez.efetch(db="nucleotide", id=i, retmode="xml")
    except HTTPError:
        try:
            time.sleep(30)
            handle = Entrez.efetch(db="nucleotide", id=i, retmode="xml")
        except HTTPError:
            print('Could not download: {}'.format(i))
        continue
    records = Entrez.read(handle)

    print ("> " + i.rstrip()+" "+records[0]["GBSeq_definition"]+" "+records[0]["GBSeq_taxonomy"]+"\n"+records[0]["GBSeq_sequence"])
    time.sleep(1) # to make sure not many requests go per second to ncbi
ADD REPLY
0
Entering edit mode

Thank you for sharing!

ADD REPLY
0
Entering edit mode

Thank you for sharing! Really helpful.

ADD REPLY
0
Entering edit mode

It works perfectly Thank you

ADD REPLY
2
Entering edit mode

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.
Upvote|Bookmark|Accept

ADD REPLY

Login before adding your answer.

Traffic: 2395 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6