Hi,
I'm trying to get lists of genes belonging to KEGG pathways. I had been using MSigDB, but some of the KEGG pathways aren't in MSigDB, so for others I was getting them directly from KEGG.
I've realised that for at least one pathway, the lists of genes differ a lot between KEGG and MSigDB. For example, for apoptosis:
https://www.gsea-msigdb.org/gsea/msigdb/cards/KEGG_APOPTOSIS.html
https://www.genome.jp/dbget-bin/www_bget?pathway+hsa04210
There are 87 genes in the MSigDB list and 136 in the KEGG list, and only 57 gene symbols appear in both lists.
I've also tried the msigdbr and gage R packages, and the lists that they give generally agree with MSigDB, using either gene symbols or Entrez IDs. It seems unlikely to be caused by one or more of the lists being outdated when there are so many differences between them, and anyway, MSigDB was last updated a couple of months ago. It also seems unlikely that three different secondary sources all agree with each other and are all wrong.
So the question is, which, if any, of these sources should I trust? Any suggestions would be appreciated!
Thanks. I hadn't come across that site, but that gives a pretty clear resolution.
I thought it was strange that two other sources - MSigDB and the gage package (https://bioconductor.org/packages/release/bioc/html/gage.html) - both had nearly the same lists, and both very different from KEGG, but I guess they must both just be similarly out of date. It looks like gage's
kegg.gs
function sources data from the kegg.db package (https://bioconductor.org/packages/release/data/annotation/html/KEGG.db.html), which is no longer updated.You can try KEGGgraph, which downloads directly from KEGG, so you end up with the current version.