Hi All,
I need to download sequences for a genome using a link similar to http://www.ncbi.nlm.nih.gov/nuccore/ACHI00000000. Downloading involves few steps:
When this page is opened in browser, a WGS link can be seen which on clicking leads to the following page - http://www.ncbi.nlm.nih.gov/Traces/wgs/?val=ACHI01#contigs
I would like to download all sequences from this page. Is there a nice way to do download sequences for multiple genomes using Biopython or any other Python module.
Please advise. Thanks!
I could download sequence by navigating to download but I want to do that from script. As downloading it manually for more than 500 bacteria's is not possible. So I was wondering if there is any way by using Entrez from biopython to query this kind pages
http://www.ncbi.nlm.nih.gov/Traces/wgs/?val=ACHI01#contigs for given bacteria.
Oh! Sorry I misunderstood your original question. You can use BioPython to implement the Entrez commands used in this guide and then save the sequences into fasta/genbank format with SeqIO. If you have any questions about using BioPython let me know.
If you have a list of bacteria search terms/accession ids in a text file, open the file for reading in python and for each line, perform the three Entrez commands in the guide and then parse the wgs sequence into a file.
WGS has a CGI interface, where you can download complete sets of contigs:
With urllib or request you can call this URL directly from python.
Thanks piet, I will try it out.