With this piece of code in python using the module biopython:
from Bio import Entrez
Entrez.email = "email@any_email.com"
organisms=["Diaphus anderseni"]
genes = ["H3"]
specs = {}
acclist = {}
for org in organisms:
for gene in genes:
query= org+"[organism] AND "+gene+"[gene]"
res = Entrez.esearch(db="nucleotide", term=query, retmax=10000)
rec = Entrez.read(res)
res = Entrez.efetch(db="nucleotide", id=rec["IdList"], retmode = "xml")
for record in Entrez.read(res):
speciesName = record["GBSeq_organism"]
accn = record["GBSeq_accession-version"]
if accn in acclist:
acclist[accn].append(speciesName)
else:
acclist[accn] = [speciesName]
I get a dictionary with two entries like this:
{'KJ555688.1': ['Diaphus anderseni'], 'KJ555689.1': ['Diaphus anderseni']}
But I would also like to prepare a dictionary that has the 'specimen_voucher' information , so it looks like this:
{'KJ555688.1': ['SIO:10-169'], 'KJ555689.1': ['SIO:10-170']}
I prepared this last dictionary manually, by looking up the complete GenBank record for KJ555688 and KJ555689. But I would like to be able to do it in python, to do it on a grander scale with hundreds of accession numbers. Any advice on this would be greatly appreciated. Thanks in advance for your time and help.