Search using Entrez and return accession numbers (not GI)
1
0
Entering edit mode
8.6 years ago
Cricket ▴ 10

I am trying to use Biopython (Entrez) with search terms that will return the accession number (and not the GI*).

Here is a tiny excerpt of my code:

from Bio import Entrez

Entrez.email = 'myemailaddress'
search_phrase = 'Escherichia coli[organism]) AND (complete genome[keyword])'
handle = Entrez.esearch(db='nuccore', term=search_phrase, retmax=100, rettype='acc', retmode='text')
result = Entrez.read(handle)
handle.close()
gi_numbers = result['IdList']
print(gi_numbers)

'745369752', '910228862', '187736741', '802098270', '802098269', '802098267', '387610477', '544579032', '544574430', '215485161', '749295052', '387823261', '387605479', '641687520', '641682562', '594009615', '557270520', '313848522', '309700213', '284919779', '215263233', '544345556', '544340954', '144661', '51773702', '202957457', '202957451', '172051323'

What slice of magic am I missing? Thank you for your assistance.

*especially since they are phasing out GI numbers

Biopython Entrez accession number GI number NCBI • 2.3k views
ADD COMMENT
2
Entering edit mode
8.6 years ago

Eutils esearch does not return complete records. You will need efetch for that. Continuing your lines of code:

from Bio import Entrez

Entrez.email = 'myemailaddress'
search_phrase = 'Escherichia coli[organism]) AND (complete genome[keyword])'
handle = Entrez.esearch(db='nuccore', term=search_phrase, retmax=100, rettype='acc', retmode='text')
result = Entrez.read(handle)
handle.close()
gi_numbers = result['IdList']

h = Entrez.efetch(db="nucleotide", id=gi_numbers, rettype="acc")
h.read().splitlines()
h.close()

['HF572917.2', 'NZ_HF572917.1', 'NC_010558.1', 'NZ_HG941720.1', 'NZ_HG941719.1', 'NZ_HG941718.1', 'NC_017633.1', 'NC_022371.1', 'NC_022370.1', 'NC_011601.1', 'NZ_HG738867.1', 'NC_012892.2', 'NC_017626.1', 'HG941719.1', 'HG941718.1', 'HG941720.1', 'HG738867.1', 'AM946981.2', 'FN649414.1', 'FN554766.1', 'FM180568.1', 'HG428756.1', 'HG428755.1', 'M37402.1', 'AJ304858.2', 'FM206294.1', 'FM206293.1', 'AM886293.1', '']

Alternatively, install eutils and run:

$ esearch -db nuccore -query "(Escherichia coli[organism]) AND (complete genome[keyword])" |efetch -mode text -format acc

HF572917.2 NZ_HF572917.1 NC_010558.1 NZ_HG941720.1 NZ_HG941719.1 NZ_HG941718.1 NC_017633.1 NC_022371.1 NC_022370.1 NC_011601.1 NZ_HG738867.1 NC_012892.2 NC_017626.1 HG941719.1 HG941718.1 HG941720.1 HG738867.1 AM946981.2 FN649414.1 FN554766.1 FM180568.1 HG428756.1 HG428755.1 M37402.1 AJ304858.2 FM206294.1 FM206293.1 AM886293.1

ADD COMMENT
0
Entering edit mode

that works great! However, with NCBI getting rid of GI numbers soon, this will stop working soon, right?

ADD REPLY

Login before adding your answer.

Traffic: 2093 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6