I want to extract the "Country of isolation" from the Genbank file. Tried to run the following command in Google collab. Accessions.txt contains accession numbers i.e. ['GCA_001719305.1', 'GCA_903231415.1', 'GCA_903231425.1', 'GCA_903231445.1', 'GCA_903231465.1', 'GCA_903231475.1', 'GCA_903231515.1']
from Bio import Entrez
# Read the accessions from a file
accessions_file = 'accessions.txt'
with open(accessions_file) as f:
ids = f.read().split('\n')
# Fetch the entries from Entrez
Entrez.email = 'name@example.org' # Insert your email here
handle = Entrez.efetch('nuccore', id=ids, retmode='xml')
response = Entrez.read(handle)
# Parse the entries to get the country
def extract_countries(entry):
sources = [feature for feature in entry['GBSeq_feature-table']
if feature['GBFeature_key'] == 'source']
for source in sources:
qualifiers = [qual for qual in source['GBFeature_quals']
if qual['GBQualifier_name'] == 'country']
for qualifier in qualifiers:
yield qualifier['GBQualifier_value']
for entry in response:
accession = entry['GBSeq_primary-accession']
for country in extract_countries(entry):
print(accession, country, sep=',')
Getting following error. Please help me to resolve this. Thanks in advance.
HTTPError Traceback (most recent call last)
<ipython-input-17-4518f5766224> in <module>()
1 Entrez.email = 'pryp88@gmail.com'
----> 2 handle = Entrez.efetch('nuccore', id=ids, retmode='xml')
3 response = Entrez.read(handle)
7 frames
/usr/lib/python3.7/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs)
647 class HTTPDefaultErrorHandler(BaseHandler):
648 def http_error_default(self, req, fp, code, msg, hdrs):
--> 649 raise HTTPError(req.full_url, code, msg, hdrs, fp)
650
651 class HTTPRedirectHandler(BaseHandler):
HTTPError: HTTP Error 400: Bad Request
Your code is not reaching the extraction part - it fails to even query Entrez. Can you run python in an interactive session and use one ID to ensure your code works fine until the
response =
line?The code alignment seems incorrect. It could be due to formating issue or due to genuine code issue.
Try with one example.
I am suspecting if the alignment is correct, and the code looks like u posted, ids is a list which efetch is not able to recognise, maybe.
Are you behind proxy?