Question

GI numbers in NCBI

1

Entering edit mode

2.0 years ago

Christoph ▴ 30

In 2016 NCBI announced that it will phase out GI numbers. https://ncbiinsights.ncbi.nlm.nih.gov/2016/09/12/the-future-of-existing-gi-numbers-at-ncbi/

And the documentation says: "Special note for sequence databases. NCBI is no longer assigning GI numbers to a growing number of new sequence records. As such, these records are not indexed in Entrez, and so cannot be retrieved using ESearch or ESummary, and have no Entrez links accessible by ELink. EFetch can retrieve these records by including their accession.version identifier in the id parameter."

I used BioPyhton today (July 19th, 2023) to download records from NCBI. I used IDs = Entrez.read(Entrez.esearch(db="nucleotide", retmax=10, term=search_term)) to obtain all records matching the search_term. This returns the old GI numbers. The first GI number I get is: 2530893015 I looked at the record and it was added in 2023. This surprised me, since this is now 7 years after NCBI said it will no longer assign GI numbers to records.

The new way to use esearch is, as far as I understand: IDs2 = Entrez.read(Entrez.esearch(db="nucleotide", idtype="acc", term=search_term, retmax=10)) This returns accession+version numbers.

If I increase retmax there will be the same maximum number of records that esearch finds for both methods. I would have assumed that records are not found if there is not GI number for them and that the number of hits differs. But this does not seem to be the case.

The documentations says: "NCBI is no longer assigning GI numbers to a growing number of new sequence records. As such, these records are not indexed in Entrez, and so cannot be retrieved using ESearch or ESummary, and have no Entrez links accessible by ELink."

Does this mean that I might miss some records and that esearch is not longer a reliable way to search NCBI. But so far, I see no indication for this?

So it looks as if GI numbers are still assigned to all new records, or do I make a mistake by trusting Entrez.esearch to find all records that indeed match my query?

Many thanks for comments in advance.

numbers GI NCBI • 1.2k views

ADD COMMENT • link 23 months ago by Christoph ▴ 30

0

Entering edit mode

As I understand it, all NCBI indicated is that end-users should move away from using gi as a reliable identifiers. It seemed that they were going to be used internally at NCBI.

Is there a specific reason you are contemplating gi? Accessions are easy to get

$ esearch -db nuccore -query "Epacromius tergestinus" | esummary | xtract -pattern DocumentSummary -element Id,AccessionVersion
2530893015      NC_080530.1
2519790161      OQ282996.1

You could always email NCBI help desk and ask if their stated policy with gi numbers has changed since that blog was posted back in 2016. Post their response here.

ADD REPLY • link 2.0 years ago by GenoMax 152k

score 2 · Accepted Answer · 2023-08-16

I emailed NCBI. I do not have the permission to post the full answer, but I think only the main points are of interest here anyway:

The most improtant points in the answer are:

They still plan to transition to accession numbers only but gi numbers still remain in place in the foreseeable future.
"GI numbers are still assigned, but there are exclusions for voluminous data (for example bacterial contigs and transcriptome records) that we only archive in the Sequence Set Browser (VDB)."