I am trying to add an option to a python program I have to allow the user to search and download the Genbank file for the genome of an organism, such as Saccharomyces cerevisiae S288C. I have the following code:
handle = Entrez.esearch(db="assembly", term = "Saccharomyces cerevisiae S288C", retmax = "100000")
record = Entrez.read(handle)
ids = record['IdList']
print(f'found {len(ids)} ids')
found 2 ids
print(ids)
['285498', '245838']
for each in ids:
esummary_handle = Entrez.esummary(db="assembly", id=each, report = "full")
esummary_record = Entrez.read(esummary_handle)
summary = esummary_record
url = summary['DocumentSummarySet']['DocumentSummary'][0]['FtpPath_GenBank']
print(url)
label = os.path.basename(url)
link = os.path.join(url, label+'_genomic.gbff.gz')
urllib.request.urlretrieve(link, f'{label}.gbff.gz')
For Saccharomyces cerevisiae S288C there are two IDs found. For the first one (285498) there is a FtpPath_GenBank and it downloads just fine, but for the other (245838) which is the general and common genome to use, there is not FtpPath_GenBank in the summary result, so the code fails. Manually searching NCBI shows me the FTP site for this genome, complete with address and the file I want: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/146/045/GCA_000146045.2_R64/
I'm really confused as to why the summary record doesn't show the FtpPath_GenBank, even though it is in NCBI. Is there an easier way to go about this? Basically I'd like the user to be able to search for an organism and be able to download the Genbank file to use later in my program. I am super new to the Entrez suite and find it a little confusing, so any help would be greatly appreciated.
There are top level assembly report files that exist for all genomes in NCBI. Here is an example for GenBank genomes. There is one for RefSeq genomes. You can parse out FTP paths from this file directly.
If you are open to using already existing tools that do this then
ncbi-genome-downloader
is an option. You can find the tool here.Thank you! That file helps a ton! I really appreciate it.