In 2016 NCBI announced that it will phase out GI numbers. https://ncbiinsights.ncbi.nlm.nih.gov/2016/09/12/the-future-of-existing-gi-numbers-at-ncbi/
And the documentation says: "Special note for sequence databases. NCBI is no longer assigning GI numbers to a growing number of new sequence records. As such, these records are not indexed in Entrez, and so cannot be retrieved using ESearch or ESummary, and have no Entrez links accessible by ELink. EFetch can retrieve these records by including their accession.version identifier in the id parameter."
I used BioPyhton today (July 19th, 2023) to download records from NCBI. I used IDs = Entrez.read(Entrez.esearch(db="nucleotide", retmax=10, term=search_term)) to obtain all records matching the search_term. This returns the old GI numbers. The first GI number I get is: 2530893015 I looked at the record and it was added in 2023. This surprised me, since this is now 7 years after NCBI said it will no longer assign GI numbers to records.
The new way to use esearch is, as far as I understand: IDs2 = Entrez.read(Entrez.esearch(db="nucleotide", idtype="acc", term=search_term, retmax=10)) This returns accession+version numbers.
If I increase retmax there will be the same maximum number of records that esearch finds for both methods. I would have assumed that records are not found if there is not GI number for them and that the number of hits differs. But this does not seem to be the case.
The documentations says: "NCBI is no longer assigning GI numbers to a growing number of new sequence records. As such, these records are not indexed in Entrez, and so cannot be retrieved using ESearch or ESummary, and have no Entrez links accessible by ELink."
Does this mean that I might miss some records and that esearch is not longer a reliable way to search NCBI. But so far, I see no indication for this?
So it looks as if GI numbers are still assigned to all new records, or do I make a mistake by trusting Entrez.esearch to find all records that indeed match my query?
Many thanks for comments in advance.
As I understand it, all NCBI indicated is that end-users should move away from using
gi
as a reliable identifiers. It seemed that they were going to be used internally at NCBI.Is there a specific reason you are contemplating
gi
? Accessions are easy to getYou could always email NCBI help desk and ask if their stated policy with
gi
numbers has changed since that blog was posted back in 2016. Post their response here.