Question

Efetch And Biopython

3

Entering edit mode

12.8 years ago

Sabrewolfy ▴ 80

I'm using BioPython 1.53 with Python 2.6. The following code was working until the recent EFetch updates:

handle = Entrez.efetch(db="nucleotide", rettype="gb", id=seq)

where 'seq' is simply a string with an accession number. Now, however, nothing is being returned. I'm not sure what I need to fix as none of the changes seem to affect what I've coded above.

biopython ncbi python • 8.0k views

ADD COMMENT • link updated 12.8 years ago by Brad Chapman 9.7k • written 12.8 years ago by Sabrewolfy ▴ 80

3

Entering edit mode

Thanks for the report. This is due to some changes at NCBI and is fixed in the current codebase: http://lists.open-bio.org/pipermail/biopython/2012-February/007743.html There will be a new release in the next week or so with these changes included.

ADD REPLY • link updated 5.1 years ago by Ram 44k • written 12.8 years ago by Brad Chapman 9.7k

1

Entering edit mode

The NCBI changes report that "EFetch URLs with multiple IDs must be entered as: id=1,2,3" and "EFetch no longer accepts invalid URL parameters, e.g., id=1&id=2&id=3". However, if only one sequence is requested, the URL would be the same ... it would end with id=1.

ADD REPLY • link 12.8 years ago by Sabrewolfy ▴ 80

0

Entering edit mode

Can you please open a bug report on biopython's bug tracker? https://redmine.open-bio.org/projects/biopython/issues?set_filter=1&tracker_id=1

ADD REPLY • link updated 5.1 years ago by Ram 44k • written 12.8 years ago by Giovanni M Dall'Olio 28k

0

Entering edit mode

@Brad Chapman: Thanks for the link. Didn't come acros that during my searching.

ADD REPLY • link 12.8 years ago by Sabrewolfy ▴ 80

0

Entering edit mode

I'm still not clear on why this is not working for me. In the example, 'seq' contains just a string for one sequence -- I am not passing in a list for multiple sequences. However, it still does not work. I've tried the various solutions mentioned in the link above (nuccore/fasta and the join solution to create a comma-separated list), but neither work. My understanding is that the URL change should have no effect if only ONE sequence is requested.

ADD REPLY • link 12.8 years ago by Sabrewolfy ▴ 80

0

Entering edit mode

As per Brad's full answer below, the problem is the NCBI changed the default retmode.

ADD REPLY • link 12.8 years ago by Peter 6.0k

score 5 · Answer 1 · 2012-02-22

5

Entering edit mode

12.8 years ago

Brad Chapman 9.7k

In addition to the multiple ID change behavior, NCBI also changed some database names and the default return modes. Here is a working query with Biopython 1.58:

from Bio import Entrez
import urllib2

Entrez.email="test@test.com"
handle = Entrez.efetch(db="nuccore", rettype="gb", retmode="text", id=76096369)
print handle.readline()

try:
    handle = Entrez.efetch(db="nuccore", rettype="gb", retmode="text",
                           id='wrong')
except urllib2.HTTPError:
    print "Bad id"
except IOError:
    print "Problem connecting to NCBI"

Should give:

LOCUS       NM_007726               5807 bp    mRNA    linear ROD 19-FEB-2012
Bad id

ADD COMMENT • link 12.7 years ago by Brad Chapman 9.7k

0

Entering edit mode

Thanks. I have tried changing the db and rettype details, but it still does not work with BioPython 1.53. I will try with your specific example.

ADD REPLY • link 12.8 years ago by Sabrewolfy ▴ 80

0

Entering edit mode

Tested again with "nuccore" but it does not work with BioPython 1.53. Some change NCBI has made has broken this in 1.53 completely. None of the work-arounds have solved it.

ADD REPLY • link 12.8 years ago by Sabrewolfy ▴ 80

0

Entering edit mode

1.53 is quite old now so there may have been bug fixes over the past several releases. Could you upgrade to 1.58 and retry? If that doesn't work, knowing the ID that is failing for you could help us reproduce the problem.

ADD REPLY • link 12.8 years ago by Brad Chapman 9.7k

0

Entering edit mode

@Brad Chapman: Thanks, I have manually upgraded to 1.59 and adjusted my code as per your example above. I notice the handle object no longer has peekline which I was using, so I'll have to fix that, but I think the fetching problem is solved now.

ADD REPLY • link 12.7 years ago by Sabrewolfy ▴ 80

0

Entering edit mode

I was using the length of peekline to determine if a sequence had in fact been returned or not (to check if an invalid accession number had been provided, for example). I'm fetching only one sequence.

ADD REPLY • link 12.7 years ago by Sabrewolfy ▴ 80

0

Entering edit mode

Although I see now that Entrez.efetch raises an HTTPError if an invalid accession number if given.

ADD REPLY • link 12.7 years ago by Sabrewolfy ▴ 80

0

Entering edit mode

But there is no HTTPError to catch in a try clause.

ADD REPLY • link 12.7 years ago by Sabrewolfy ▴ 80

0

Entering edit mode

Yes, Peter has put in a lot of work to make the error handling more transparent so you don't have to manually check for problems. I added example code for catching and identifying bad records and network errors. Hope this helps.

ADD REPLY • link 12.7 years ago by Brad Chapman 9.7k

0

Entering edit mode

Thanks for the example. I tried the urllib2.HTTPError, but forgot to import urllib2 :)

ADD REPLY • link 12.7 years ago by Sabrewolfy ▴ 80