where 'seq' is simply a string with an accession number. Now, however, nothing is being returned. I'm not sure what I need to fix as none of the changes seem to affect what I've coded above.
The NCBI changes report that "EFetch URLs with multiple IDs must be entered as: id=1,2,3" and "EFetch no longer accepts invalid URL parameters, e.g., id=1&id=2&id=3". However, if only one sequence is requested, the URL would be the same ... it would end with id=1.
I'm still not clear on why this is not working for me. In the example, 'seq' contains just a string for one sequence -- I am not passing in a list for multiple sequences. However, it still does not work. I've tried the various solutions mentioned in the link above (nuccore/fasta and the join solution to create a comma-separated list), but neither work. My understanding is that the URL change should have no effect if only ONE sequence is requested.
In addition to the multiple ID change behavior, NCBI also changed some database names and the default return modes. Here is a working query with Biopython 1.58:
Tested again with "nuccore" but it does not work with BioPython 1.53. Some change NCBI has made has broken this in 1.53 completely. None of the work-arounds have solved it.
1.53 is quite old now so there may have been bug fixes over the past several releases. Could you upgrade to 1.58 and retry? If that doesn't work, knowing the ID that is failing for you could help us reproduce the problem.
@Brad Chapman: Thanks, I have manually upgraded to 1.59 and adjusted my code as per your example above. I notice the handle object no longer has peekline which I was using, so I'll have to fix that, but I think the fetching problem is solved now.
I was using the length of peekline to determine if a sequence had in fact been returned or not (to check if an invalid accession number had been provided, for example). I'm fetching only one sequence.
Yes, Peter has put in a lot of work to make the error handling more transparent so you don't have to manually check for problems. I added example code for catching and identifying bad records and network errors. Hope this helps.
Thanks for the report. This is due to some changes at NCBI and is fixed in the current codebase: http://lists.open-bio.org/pipermail/biopython/2012-February/007743.html There will be a new release in the next week or so with these changes included.
The NCBI changes report that "EFetch URLs with multiple IDs must be entered as: id=1,2,3" and "EFetch no longer accepts invalid URL parameters, e.g., id=1&id=2&id=3". However, if only one sequence is requested, the URL would be the same ... it would end with
id=1
.Can you please open a bug report on biopython's bug tracker? https://redmine.open-bio.org/projects/biopython/issues?set_filter=1&tracker_id=1
@Brad Chapman: Thanks for the link. Didn't come acros that during my searching.
I'm still not clear on why this is not working for me. In the example, 'seq' contains just a string for one sequence -- I am not passing in a list for multiple sequences. However, it still does not work. I've tried the various solutions mentioned in the link above (nuccore/fasta and the
join
solution to create a comma-separated list), but neither work. My understanding is that the URL change should have no effect if only ONE sequence is requested.As per Brad's full answer below, the problem is the NCBI changed the default retmode.