Question

Fetch KO IDs for all genes in a genome using accession number only

0

Entering edit mode

6.4 years ago

Areej.alsheikh • 0

Hi,

Using a Genbank (or Refseq) accession number, what is the best way to obtain the list of KOs using KEGG API?

For example, the genome Bacillus cereus ATCC 10987 has the Genbank accession number GCA_000008005.1, how to get the list of KOs using GCA_000008005.1? I know that I can do that using link, but only if I have the T number (https://api.kegg.net/link/ko/<t-number>). My question is, what if I have a list of >10K genomes, and I only have their genbank accession numbers, how can I achieve this using KEGG API?

Thank you.

KO accession genbank kegg api • 2.6k views

ADD COMMENT • link 6.4 years ago by Areej.alsheikh • 0

score 0 · Answer 1 · 2018-07-02

0

Entering edit mode

6.4 years ago

Nitin Narwade ★ 1.6k

In this case KEGGREST, an R package would be really helpful. All you need is just three letters organism code given by KEGG.

I have tried with Bacillus cereus ATCC 10987 (KEGG ORG code: bca).

source("https://bioconductor.org/biocLite.R")
biocLite("KEGGREST")
library(KEGGREST)
all.pathways.for.bca <- keggList("bca")

The above code will return all pathways for bca.

you can get detailed help here.

ADD COMMENT • link 6.4 years ago by Nitin Narwade ★ 1.6k

0

Entering edit mode

Thanks. With the approach you're suggesting, I'd still need the organism name which I don't have. I only have the genbank accession ID. I figured out the solution, which I'll post in a separate post so people can see. Thanks for your reply :)

ADD REPLY • link 6.4 years ago by Areej.alsheikh • 0

score 0 · Answer 2 · 2018-07-08

I figured out a solution. There is basically no direct way to query the genome gb accession to get the list of KOs or whatever it is you're looking for. Instead you'll have to do the following:

Get the list of all organisms available in KEGG (this will show the organism 3-letter code and the T-number)

https://api.kegg.net/list/genome

Use the T number to obtain the genome page for each organism that exists in KEGG, for example:

https://api.kegg.net/get/gn:T00001

Parse each genome page looking for the genbank accession which should be next to: DATA SOURCE (Assembly: acc-id)
If that genbank ID is what you're looking for, save it, and save the corresponding T number for it.
Use the T number of the genomes you want to fetch the list of kos (or whatever else you're looking for), for example:

https://api.kegg.net/link/ko/T00001

It's long, but worked for me. I hope it helps someone else. Areej