Question

Genbank GI accession format for DAVID input

0

Entering edit mode

6.2 years ago

jfo ▴ 40

Hi!

I would like to seek an advice or two with regard to the proper format of GENBANK_GI_ACCESSION for DAVID Function Analysis. I tried these formats:

gi123456
gi|123456
123456

Sadly, nothing worked. I could not find any examples. I prefer gi accessions for this because all my unigenes of interest have them. I do have some ref seq counterpart and symbol IDs but not all of my unigenes have a ref seq or symbol IDs. And yes, I'm not sure what I'm doing. Any help will be appreciated!

DAVID Functional Analysis • 2.0k views

ADD COMMENT • link updated 6.2 years ago by Istvan Albert 102k • written 6.2 years ago by jfo ▴ 40

score 0 · Answer 1 · 2019-02-19

0

Entering edit mode

6.2 years ago

Istvan Albert 102k

Using gi number is a bad idea. NCIB stopped using them, data is not being released with gi numbers, hence you are guaranteed to operate on outdated information. You may run into various kinds of mysterious errors as well - although the problems with DAVID are simply that it is an atrocious system to begin with.

Having a GI number without an accession number also sounds quite unexpected - the chances that a tool would work with such a data is again much reduced. You can convert gi numbers to accession numbers with entrez direct with

efetch -db nuccore -id 663070995,568815587 -format acc

to produce:

NM_001178.5
NC_000011.10

or an even simpler way as stated here:

https://ncbiinsights.ncbi.nlm.nih.gov/2016/12/06/converting-gi-numbers-to-accession-version/

with a command such as:

curl 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=663070995,568815587&rettype=acc'

which will produce the same output:

NM_001178.5
NC_000011.10

Verify that your gi numbers do indeed lack an accesion number

PS:

You could even just make the right URL and paste that into your browser

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=663070995,568815587&rettype=acc

ADD COMMENT • link 6.2 years ago by Istvan Albert 102k

0

Entering edit mode

Thank you for the prompt answer. I do have the accessions; however, I am not sure how to use different input types for the analysis in DAVID. For example I have ref (mostly XP_), dbj, gb, or sp for the accessions. How do I convert these accession IDs to a DAVID-"interpret-able" format? I am having a hard time looking for a way to do this. For example, I tried the Retrieve/ID Mapping of uniprotkb but not all my unigenes with gi matched to a uniprot. I do not know how to proceed from there.

ADD REPLY • link 6.2 years ago by jfo ▴ 40

0

Entering edit mode

DAVID ought to understand many different types of accession numbers. Try something simple first, use only a subset of the gene names, to get your bearings first, and ensure that it works. If you are not sure what to pick start here

http://data.biostarhandbook.com/redo/zika/zika-up-regulated.csv

take the 20 gene names from the first column and see if you can make DAVID work.

I would also recommend an alternative tool

https://biit.cs.ut.ee/gprofiler/gost

and the converter here

https://biit.cs.ut.ee/gprofiler/convert

ADD REPLY • link 6.2 years ago by Istvan Albert 102k

0

Entering edit mode

I have the Official Gene Symbols, which actually works. My confusion comes from the use of the unigenes with nr hits but with no gene symbols. Should I just proceed with those which had gene symbols? This is why I was looking for a way to get all these unigenes with nr hits to have other accession numbers (e.g. gene symbols, uniprot) to represent them all. I'm not even sure if this is possible, though.

As I have mentioned, most of my unigenes had protein hits with XP_ but some with gb| or dbj| reference instead. The gi number is the only ID that is present in all my unigenes with nr hits. Curious question: Is it possible to convert all unigenes with nr hits into their corresponding Uniprot or Gene symbol IDs? I'm asking because I could not seem to find XP_ counterpart for those with gb| or dbj| (e.g. gb|ABC87995.1). Or is it okay to proceed with the analysis with just those unigenes with gene symbols? I'm so confused I don't know if my questions are valid.

ADD REPLY • link 6.2 years ago by jfo ▴ 40