I'm looking for the most simple of tables but it's difficult to find. A table of KEGG orthologs in the following format:
[MODULE]\t[KO_1, KO_2, ..., KO_N]
I downloaded a weird formatted flat file from KEGG but for some of the KEGG modules there were other KEGG modules in the hierarchy (yes, I know KEGG is hierarchical) such as https://www.genome.jp/kegg-bin/show_module?M00615
Does anyone know where I can find this? I just need a very simple table for set comprehension.
You can use R to construct the table. You would need to install the package and load it. You can use the code below to connect to the KEGG database, retrieve module information, map to get corresponding ortholog information, and construct the table.
#Install package to get relevant information from KEGG database
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("KEGGREST")
#Note: Uncomment the above code once the installation is successful
#Load package
library(KEGGREST)
#Get list of modules in KEGG
mod <- keggList("module")
#Loop through each module and get corresponding orthologs
#Return module ID, corresponding list of orthologs
#obj is a list of list(moduleID,orthologs)
#[[moduleID1,orthologs list 1],[moduleID2,orthologs list 2].etc]
obj<-lapply(names(mod),function(x)
{
module<-strsplit(x,"md:")[[1]][2]
#Search for corresponding ortholog
ko<-keggGet(x)
#Save list of orthologs as a string separated by ","
orthologs<-paste(names(ko[[1]]$ORTHOLOGY),collapse = ",")
list(module,orthologs)
})
#Convert list to dataframe
df<-do.call(rbind,obj)
#Name columns
colnames(df)<-c("Module","KO")
#Display first few entries in the table
head(df)
#Save table to csv file
write.csv(df,path to file/filename.csv)
This is amazing! Thank you so much. It pretty much works like a charm. However, I noticed a few weird parts. Do you know why some of the KO descriptions are in there? For example, 11 M00011 K00164,K00658,K00382,K00174,K00175,K00177,K00176,K01902,K01903,K01899,K01900,K18118,K00234,K00235,K00236,K00237,K00239,K00240,K00241,K00242,K18859,K18860,K00244,K00245,K00246,K00247 fumarate reductase [EC:1.3.5.4] [RN:R02164],K01676,K01679,K01677+K01678,K00026,K00025,K00024,K00116. Also, is it possible to output what "version" of KEGG this for when I store the file? That could be useful for accessing this in the future.
Also, using this method there are only 443 modules. What happened to other modules such as "M00080"? I'm not seeing these on the KEGG website but seeing them in previous publications.
This is amazing! Thank you so much. It pretty much works like a charm. However, I noticed a few weird parts. Do you know why some of the KO descriptions are in there? For example,
11 M00011 K00164,K00658,K00382,K00174,K00175,K00177,K00176,K01902,K01903,K01899,K01900,K18118,K00234,K00235,K00236,K00237,K00239,K00240,K00241,K00242,K18859,K18860,K00244,K00245,K00246,K00247 fumarate reductase [EC:1.3.5.4] [RN:R02164],K01676,K01679,K01677+K01678,K00026,K00025,K00024,K00116
. Also, is it possible to output what "version" of KEGG this for when I store the file? That could be useful for accessing this in the future.Also, using this method there are only 443 modules. What happened to other modules such as "M00080"? I'm not seeing these on the KEGG website but seeing them in previous publications.