Web scraping in the KEGG db
2
1
Entering edit mode
4.2 years ago
bpvalderrama ▴ 30

Hi everyone!

I'm doing a project on 16S data and used Piphillin to infer functional capacities. The output was a .txt file with several ID numbers for pathways in the KEGG database. My question is whether there is a way for web scraping the table w/ this information from this website. It could also be a R package or whatever that I could use to automatize the match between de KEGG code and bio-function.

The lab I'm working with has almost 0 bioinformatic training and they do the identification by looking -manually- for each code and then write the biological significance in an excel file.

Thanks for your answers :)

R KEGG • 1.8k views
ADD COMMENT
0
Entering edit mode

You can use answers in:
Extracting List Of Genes Associated With A Pathway In Kegg
R: download all KEGG pathways including KO and Compounds

Be careful about doing bulk downloads. You may violate their acceptable use policy if you try to download the entire database.

ADD REPLY
0
Entering edit mode

For people that could also be interested, I found the KEGG API pretty straightforward for that purpose

ADD REPLY
3
Entering edit mode
4.2 years ago
svp ▴ 680

You can try using TOGOWS

get the genes for Alzheimer disease: hsa05010 as JSON

http://togows.dbcls.jp/entry/pathway/hsa05010/genes.json

[
  {
    "102": "ADAM10; ADAM metallopeptidase domain 10 [KO:K06704] [EC:3.4.24.81]",
    "6868": "ADAM17; ADAM metallopeptidase domain 17 [KO:K06059] [EC:3.4.24.86]",
    "351": "APP; amyloid beta precursor protein [KO:K04520]",
    "8883": "NAE1; NEDD8 activating enzyme E1 subunit 1 [KO:K04532]",
    "322": "APBB1; amyloid beta precursor protein binding family B member 1 [KO:K04529]",
    "2597": "GAPDH; glyceraldehyde-3-phosphate dehydrogenase [KO:K00134] [EC:1.2.1.12]",
    "23621": "BACE1; beta-secretase 1 [KO:K04521] [EC:3.4.23.46]",
    "25825": "BACE2; beta-secretase 2 [KO:K07747] [EC:3.4.23.45]",
    "10313": "RTN3; reticulon 3 [KO:K20723]",
    "57142": "RTN4; reticulon 4 [KO:K20720]",
    "55851": "PSENEN; presenilin enhancer, gamma-secretase subunit [KO:K06170]",
    "5663": "PSEN1; presenilin 1 [KO:K04505] [EC:3.4.23.-]",
    "5664": "PSEN2; presenilin 2 [KO:K04522] [EC:3.4.23.-]",
    "23385": "NCSTN; nicastrin [KO:K06171]",
    "51107": "APH1A; aph-1 homolog A, gamma-secretase subunit [KO:K06172]",
    "83464": "APH1B; aph-1 homolog B, gamma-secretase subunit [KO:K06172]",
    "3416": "IDE; insulin degrading enzyme [KO:K01408] [EC:3.4.24.56]",
    "4311": "MME; membrane metalloendopeptidase [KO:K01389] [EC:3.4.24.11]",
    "4137": "MAPT; microtubule associated protein tau [KO:K04380]",
    "4535": "ND1; NADH dehydrogenase, subunit 1 (complex I) [KO:K03878] [EC:7.1.1.2]",
    "4536": "ND2; MTND2 [KO:K03879] [EC:7.1.1.2]",
    "4537": "ND3; NADH dehydrogenase, subunit 3 (complex I) [KO:K03880] [EC:7.1.1.2]",
    "4538": "ND4; NADH dehydrogenase, subunit 4 (complex I) [KO:K03881] [EC:7.1.1.2]",
    "4539": "ND4L; NADH dehydrogenase, subunit 4L (complex I) [KO:K03882] [EC:7.1.1.2]",
    "4540": "ND5; NADH dehydrogenase, subunit 5 (complex I) [KO:K03883] [EC:7.1.1.2]",
    "4541": "ND6; NADH dehydrogenase, subunit 6 (complex I) [KO:K03884] [EC:7.1.1.2]",
    "4723": "NDUFV1; NADH:ubiquinone oxidoreductase core subunit V1 [KO:K03942] [EC:7.1.1.2]",
    "4729": "NDUFV2; NADH:ubiquinone oxidoreductase core subunit V2 [KO:K03943] [EC:7.1.1.2]",
    "4731": "NDUFV3; NADH:ubiquinone oxidoreductase subunit V3 [KO:K03944]",
    "4694": "NDUFA1; NADH:ubiquinone oxidoreductase subunit A1 [KO:K03945]",
    "4695": "NDUFA2; NADH:ubiquinone oxidoreductase subunit A2 [KO:K03946]",
    "4696": "NDUFA3; NADH:ubiquinone oxidoreductase subunit A3 [KO:K03947]",
    "4697": "NDUFA4; NDUFA4 mitochondrial complex associated [KO:K03948]",
    "56901": "NDUFA4L2; NDUFA4 mitochondrial complex associated like 2 [KO:K03948]",
    "4698": "NDUFA5; NADH:ubiquinone oxidoreductase subunit A5 [KO:K03949]",
    "4700": "NDUFA6; NADH:ubiquinone oxidoreductase subunit A6 [KO:K03950]",
    "4701": "NDUFA7; NADH:ubiquinone oxidoreductase subunit A7 [KO:K03951]",
    "4702": "NDUFA8; NADH:ubiquinone oxidoreductase subunit A8 [KO:K03952]",
    "4704": "NDUFA9; NADH:ubiquinone oxidoreductase subunit A9 [KO:K03953]",
    "4705": "NDUFA10; NADH:ubiquinone oxidoreductase subunit A10 [KO:K03954]",
    "4706": "NDUFAB1; NADH:ubiquinone oxidoreductase subunit AB1 [KO:K03955]",
    "126328": "NDUFA11; NADH:ubiquinone oxidoreductase subunit A11 [KO:K03956]",
    "55967": "NDUFA12; NADH:ubiquinone oxidoreductase subunit A12 [KO:K11352]",
    "51079": "NDUFA13; NADH:ubiquinone oxidoreductase subunit A13 [KO:K11353]",
    "4707": "NDUFB1; NADH:ubiquinone oxidoreductase subunit B1 [KO:K03957]",
    "4714": "NDUFB8; NADH:ubiquinone oxidoreductase subunit B8 [KO:K03964]",
    "4715": "NDUFB9; NADH:ubiquinone oxidoreductase subunit B9 [KO:K03965]",
    "4716": "NDUFB10; NADH:ubiquinone oxidoreductase subunit B10 [KO:K03966]",
    "54539": "NDUFB11; NADH:ubiquinone oxidoreductase subunit B11 [KO:K11351]",
    "4719": "NDUFS1; NADH:ubiquinone oxidoreductase core subunit S1 [KO:K03934] [EC:7.1.1.2]",
    "4720": "NDUFS2; NADH:ubiquinone oxidoreductase core subunit S2 [KO:K03935] [EC:7.1.1.2]",
    "4722": "NDUFS3; NADH:ubiquinone oxidoreductase core subunit S3 [KO:K03936] [EC:7.1.1.2]",
    "4724": "NDUFS4; NADH:ubiquinone oxidoreductase subunit S4 [KO:K03937]",
    "4725": "NDUFS5; NADH:ubiquinone oxidoreductase subunit S5 [KO:K03938]",
    "4726": "NDUFS6; NADH:ubiquinone oxidoreductase subunit S6 [KO:K03939]",
    "374291": "NDUFS7; NADH:ubiquinone oxidoreductase core subunit S7 [KO:K03940] [EC:7.1.1.2]",
    "4728": "NDUFS8; NADH:ubiquinone oxidoreductase core subunit S8 [KO:K03941] [EC:7.1.1.2]",

  }
]
ADD COMMENT
1
Entering edit mode
4.2 years ago
MatthewP ★ 1.4k

R package KEGGREST is used to get KEGG pathway data.

ADD COMMENT

Login before adding your answer.

Traffic: 1787 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6