Question

Error using NCBI Python guide

0

Entering edit mode

5.0 years ago

nobu.kim66 ▴ 40

Found here: https://www.ncbi.nlm.nih.gov/dbvar/content/tools/entrez/

I get

KeyError                                  Traceback (most recent call last)
<ipython-input-35-be9c80362590> in <module>
----> 1 for ds in dsdocs['eSummaryResult']['DocumentSummarySet']['DocumentSummary']:
      2     for p in ds['dbVarPlacementList']['dbVarPlacement']:
      3         print(ds['@uid'], ds['ST'], ds['SV'], p['Chr'], p['Chr_start'],p['Chr_end'], p['Chr_inner_start'],p['Chr_inner_end'])

KeyError: 'DocumentSummarySet'

Also in their guide after doing xmltodict(), there is a nested loop that does not seem to be properly indented:

for ds in dsdocs['eSummaryResult']['DocumentSummarySet']['DocumentSummary']:
    for p in ds['dbVarPlacementList']['dbVarPlacement']:
        print(ds['@uid'], ds['ST'], ds['SV'], p['Chr'], p['Chr_start'],p['Chr_end'], p['Chr_inner_start'],p['Chr_inner_end'])

In the guide the nested for loop appears at the same indentation level as the outer loop.

Does anyone know how to search db='proteins' for example lactate Mus musculus?

It looks like I got some hits using the following the following line substituted for theirs.

eSearch = Entrez.esearch(db=db,term='lactate AND mus musculus[organism]', **paramEutils)

Result was:

# get results as dict
res = Entrez.read(eSearch)
for k in res:
    print(k, "=", res[k])

Count = 248
RetMax = 20
RetStart = 0
QueryKey = 1
WebEnv = MCID_5fa592044e76ef26396680f4
IdList = ['927028883', '927028881', '226061948', '85701812', '27369928', '6679261', '1780282714', '295317388', '257743039', '188219522', '188035865', '161333819', '113865979', '113865977', '110347555', '84697028', '84579885', '13507630', '8393739', '7305143']
TranslationSet = [{'From': 'mus musculus[organism]', 'To': '"Mus musculus"[Organism]'}]
TranslationStack = [{'Term': 'lactate[All Fields]', 'Field': 'All Fields', 'Count': '957836', 'Explode': 'N'}, {'Term': '"Mus musculus"[Organism]', 'Field': 'Organism', 'Count': '342504', 'Explode': 'Y'}, 'AND']
QueryTranslation = lactate[All Fields] AND "Mus musculus"[Organism]

What I hope to achieve is to search the proteins database by keyword terms, get a list or other data structure of accession numbers if there are any hits, choose one accession number and search for homologs and get a list or other data structure of those and also to be able to get the FASTA file for any of the accession numbers that come up during any part of the search above. Documentation seems sparse on this use case.

entrez python api • 748 views

ADD COMMENT • link updated 5.0 years ago by Ram 45k • written 5.0 years ago by nobu.kim66 ▴ 40