How can I obtain specimen_voucher information for each accession number in a python dictionary prepared with biopython?
1
0
Entering edit mode
6.9 years ago
sw.knudsen • 0

With this piece of code in python using the module biopython:

from Bio import Entrez
Entrez.email = "email@any_email.com"

organisms=["Diaphus anderseni"]
genes = ["H3"]
specs = {}
acclist = {}
for org in organisms:
    for gene in genes:
        query= org+"[organism] AND "+gene+"[gene]"
        res = Entrez.esearch(db="nucleotide", term=query, retmax=10000)
        rec = Entrez.read(res)
        res = Entrez.efetch(db="nucleotide", id=rec["IdList"],  retmode = "xml")
        for record in Entrez.read(res):
            speciesName = record["GBSeq_organism"]
            accn = record["GBSeq_accession-version"]
            if accn in acclist:
                acclist[accn].append(speciesName)
            else:
                acclist[accn] = [speciesName]

I get a dictionary with two entries like this:

{'KJ555688.1': ['Diaphus anderseni'], 'KJ555689.1': ['Diaphus anderseni']}

But I would also like to prepare a dictionary that has the 'specimen_voucher' information , so it looks like this:

{'KJ555688.1': ['SIO:10-169'], 'KJ555689.1': ['SIO:10-170']}

I prepared this last dictionary manually, by looking up the complete GenBank record for KJ555688 and KJ555689. But I would like to be able to do it in python, to do it on a grander scale with hundreds of accession numbers. Any advice on this would be greatly appreciated. Thanks in advance for your time and help.

py python biopython GenBank record • 1.3k views
ADD COMMENT
0
Entering edit mode
5.6 years ago
AK ★ 2.2k

Hi sw.knudsen,

specimen_voucher will be record["GBSeq_feature-table"][0]["GBFeature_quals"][3]["GBQualifier_value"]:

$ cat test.py
from Bio import Entrez
Entrez.email = "email@any_email.com"

organisms=["Diaphus anderseni"]
genes = ["H3"]
specs = {}
acclist = {}
for org in organisms:
    for gene in genes:
        query= org+"[organism] AND "+gene+"[gene]"
        res = Entrez.esearch(db="nucleotide", term=query, retmax=10000)
        rec = Entrez.read(res)
        res = Entrez.efetch(db="nucleotide", id=rec["IdList"],  retmode = "xml")
        for record in Entrez.read(res):
            speciesName = record["GBSeq_organism"]
            accn = record["GBSeq_accession-version"]
            specimenVoucher = record["GBSeq_feature-table"][0]["GBFeature_quals"][3]["GBQualifier_value"]
            if accn in acclist:
                acclist[accn].append(specimenVoucher)
            else:
                acclist[accn] = [specimenVoucher]

print(acclist)

$ python test.py
{'KJ555689.1': ['SIO:10-170'], 'KJ555688.1': ['SIO:10-169']}
ADD COMMENT

Login before adding your answer.

Traffic: 2694 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6