Efetch And Biopython
1
3
Entering edit mode
12.8 years ago
Sabrewolfy ▴ 80

I'm using BioPython 1.53 with Python 2.6. The following code was working until the recent EFetch updates:

handle = Entrez.efetch(db="nucleotide", rettype="gb", id=seq)

where 'seq' is simply a string with an accession number. Now, however, nothing is being returned. I'm not sure what I need to fix as none of the changes seem to affect what I've coded above.

biopython ncbi python • 8.1k views
ADD COMMENT
3
Entering edit mode

Thanks for the report. This is due to some changes at NCBI and is fixed in the current codebase: http://lists.open-bio.org/pipermail/biopython/2012-February/007743.html There will be a new release in the next week or so with these changes included.

ADD REPLY
1
Entering edit mode

The NCBI changes report that "EFetch URLs with multiple IDs must be entered as: id=1,2,3" and "EFetch no longer accepts invalid URL parameters, e.g., id=1&id=2&id=3". However, if only one sequence is requested, the URL would be the same ... it would end with id=1.

ADD REPLY
0
Entering edit mode

Can you please open a bug report on biopython's bug tracker? https://redmine.open-bio.org/projects/biopython/issues?set_filter=1&tracker_id=1

ADD REPLY
0
Entering edit mode

@Brad Chapman: Thanks for the link. Didn't come acros that during my searching.

ADD REPLY
0
Entering edit mode

I'm still not clear on why this is not working for me. In the example, 'seq' contains just a string for one sequence -- I am not passing in a list for multiple sequences. However, it still does not work. I've tried the various solutions mentioned in the link above (nuccore/fasta and the join solution to create a comma-separated list), but neither work. My understanding is that the URL change should have no effect if only ONE sequence is requested.

ADD REPLY
0
Entering edit mode

As per Brad's full answer below, the problem is the NCBI changed the default retmode.

ADD REPLY
5
Entering edit mode
12.8 years ago

In addition to the multiple ID change behavior, NCBI also changed some database names and the default return modes. Here is a working query with Biopython 1.58:

from Bio import Entrez
import urllib2

Entrez.email="test@test.com"
handle = Entrez.efetch(db="nuccore", rettype="gb", retmode="text", id=76096369)
print handle.readline()

try:
    handle = Entrez.efetch(db="nuccore", rettype="gb", retmode="text",
                           id='wrong')
except urllib2.HTTPError:
    print "Bad id"
except IOError:
    print "Problem connecting to NCBI"

Should give:

LOCUS       NM_007726               5807 bp    mRNA    linear ROD 19-FEB-2012
Bad id
ADD COMMENT
0
Entering edit mode

Thanks. I have tried changing the db and rettype details, but it still does not work with BioPython 1.53. I will try with your specific example.

ADD REPLY
0
Entering edit mode

Tested again with "nuccore" but it does not work with BioPython 1.53. Some change NCBI has made has broken this in 1.53 completely. None of the work-arounds have solved it.

ADD REPLY
0
Entering edit mode

1.53 is quite old now so there may have been bug fixes over the past several releases. Could you upgrade to 1.58 and retry? If that doesn't work, knowing the ID that is failing for you could help us reproduce the problem.

ADD REPLY
0
Entering edit mode

@Brad Chapman: Thanks, I have manually upgraded to 1.59 and adjusted my code as per your example above. I notice the handle object no longer has peekline which I was using, so I'll have to fix that, but I think the fetching problem is solved now.

ADD REPLY
0
Entering edit mode

I was using the length of peekline to determine if a sequence had in fact been returned or not (to check if an invalid accession number had been provided, for example). I'm fetching only one sequence.

ADD REPLY
0
Entering edit mode

Although I see now that Entrez.efetch raises an HTTPError if an invalid accession number if given.

ADD REPLY
0
Entering edit mode

But there is no HTTPError to catch in a try clause.

ADD REPLY
0
Entering edit mode

Yes, Peter has put in a lot of work to make the error handling more transparent so you don't have to manually check for problems. I added example code for catching and identifying bad records and network errors. Hope this helps.

ADD REPLY
0
Entering edit mode

Thanks for the example. I tried the urllib2.HTTPError, but forgot to import urllib2 :)

ADD REPLY

Login before adding your answer.

Traffic: 1964 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6