syntax questions about getting sequences from KO numbers using KEGGREST
1
1
Entering edit mode
6.3 years ago
jon.sy.tarn ▴ 10

I suppose my question is along the same veins of previous posts such as these:

download KEGG genes sequence in fasta format

Basically what I want to do is feed a list of KO numbers from kegg into a program, and get the resulting amino acid or nucleotide fastas from each of these KO numbers.

Based on what I've already read, I need to be using KEGGREST.

However, I'm having some trouble deciphering the syntax.

This is the usage provided for me via keggget on the API manual:

keggGet(dbentries, option = c("aaseq", "ntseq", "mol", "kcf", "image", "kgml"))

and this is an example they show:

str(res)
res <- keggGet(c("hsa:10458", "ece:Z5100"), "aaseq") ## retrieves amino
## acid sequences of a human gene and an
## E.coli O157 gene

my question is: how do I decipher this? Am I to assume that I can enter a KO number in place of the aca or hsa numbers shown above?

sorry for the potentially basic question.

KEGG • 1.2k views
ADD COMMENT
0
Entering edit mode
6.3 years ago
Mark ★ 1.6k

Yes that's correct. It might be helpful to explicity define what each option is doing to illustrate how the function is operating:

res <- keggGet(option = "aaseq", dbentries = c("hsa:10458", "ece:Z5100"))

option selects the database to search and dbentries is the ID of the entries you want to retrieve. It will return a list, which you can subset using the $ notation. You will then have to use the package biostrings to manipulate the sequences.

If you have a list of say 100 IDs you want queries, you can automate the process like this:

my_list <- c("hsa:10458", "ece:Z5100", "hsa:10458", "ece:Z5100", 
             "hsa:10458", "ece:Z5100", "hsa:10458", "ece:Z5100",
             "hsa:10458", "ece:Z5100", "hsa:10458", "ece:Z5100",
             "hsa:10458", "ece:Z5100", "hsa:10458", "ece:Z5100")

split_my_list <- split(my_list, 1:4)
results <- lapply(X = split_my_list, FUN = keggGet, option = "aaseq")

I've copied the same entries over and over again just for illustration purposes. Using the function split I've split the list in 4 chunks (change 1:4 to 1:10 or whatever you want). Then I use lapply to apply keggGet on the split list.

It will return a list of lists, so subset like this results$1$results$'1'$blahblah.

ADD COMMENT

Login before adding your answer.

Traffic: 1904 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6