Entering edit mode
9.8 years ago
moranr
▴
290
Hi,
My goal is to download all the complete nucleotide genome for metazoans.
I can about half of these very easily from Ensembl Metazoa. However, for the rest of the species I am thinking I need to use Entrez Utilities on NCBI with python.
My problem is selecting only completed genomes. Even if it is a case where all assemblies are downloaded for each species - that would be ok. I want a single fasta/gb file for a genome/assembly.
At the moment I am:
#Search Entrez and get ID for each species
with open('SpeciesList.csv', 'rU') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
for sp in reader:
search_term = str(sp[0])+'[orgn] complete genome[title]NOT mitochondria[title]'
handle = Entrez.esearch(db='genome', term=search_term)
genome_ids = Entrez.read(handle)['IdList']
##get gb files using ids
for genome_id in genome_ids:
record = Entrez.efetch(db="nucleotide", id=genome_id, rettype="gb", retmode="text")
filename = 'genBankRecord_{}.gb'.format(genome_id)
print('Writing:{}'.format(filename))
with open(filename, 'w') as f:
f.write(record.read())
##Parse gb files
My problem is only grabbing gb files for completed genomes. Can anyone help with my search query here please?