How to get Kegg pathway names from kegg pathway IDs
2
1
Entering edit mode
7.1 years ago
Rose ▴ 20

Hello,

I 'm trying to find the pathway names corresponding to pathway IDs of mus musculus in R. My input file is in the form below. Kegg_ID

  1. 4520
  2. 04145, 04514, 04650, 04670, 04810, 05140, 05144, 05146, 05150, 05323, 05416
  3. 4622
  4. 00561, 00564, 01100, 04070

I found some working codes in other posts, but those input files contain only one ID per column. Please help me.

Thanks in advance.

Kegg pathway names • 4.2k views
ADD COMMENT
6
Entering edit mode
7.1 years ago
EagleEye 7.6k

GeneSCF 'prepare_database' module downloads KEGG-ID, description and genes associated with the terms as simple table format (plain text file).

./prepare_database -db=KEGG -org=mmu

The above command downloads complete KEGG db as simple text file in following location, 'geneSCF-tool/class/lib/db/mmu/'.

ADD COMMENT
2
Entering edit mode
7.1 years ago
   $ sed 's/, /\n/g' test.txt | awk '{printf("%05d\n", $1)}' | parallel wget -qO- http://togows.org/entry/kegg-pathway/mmu{}/pathways >> pathways.txt

output:

$ cat pathways.txt 
mmu04520  Adherens junction
mmu04145  Phagosome
mmu04514  Cell adhesion molecules (CAMs)
mmu04650  Natural killer cell mediated cytotoxicity
mmu04670  Leukocyte transendothelial migration
mmu04810  Regulation of actin cytoskeleton
mmu05140  Leishmaniasis
mmu05144  Malaria
mmu05146  Amoebiasis
mmu05150  Staphylococcus aureus infection
mmu05323  Rheumatoid arthritis
mmu05416  Viral myocarditis
mmu04622  RIG-I-like receptor signaling pathway
mmu00561  Glycerolipid metabolism
mmu00564  Glycerophospholipid metabolism
mmu01100  Metabolic pathways
mmu04070  Phosphatidylinositol signaling system

input:

$ cat test.txt 
4520
04145, 04514, 04650, 04670, 04810, 05140, 05144, 05146, 05150, 05323, 05416
4622
00561, 00564, 01100, 04070
ADD COMMENT
0
Entering edit mode

Hi, Thanks for the code, It works fine, but In my file, In the second column, I have 11 IDs for a gene 'Fer'. So, I should get the pathway names of them separated by commas in a column. To be more specific, All pathway names of that gene should be in one column. Similarly, for all other genes.

ADD REPLY
0
Entering edit mode

Could you please update the example data and expected output?

ADD REPLY
0
Entering edit mode

This is the example data.

GENE_NAME KEGG_ID

  1. Klf6 NULL
  2. Fer 04145, 04514, 04650, 04670, 04810, 05140, 05144, 05146, 05150, 05323, 05416 3.Itgb2l 04622

Expected Output : GENE_NAME KEGG_ID PathwayName

  1. Klf6 NULL 2.Fer 04520 Adherens junction
  2. Fer 04145,04514...... Phagosome,Cell adhesion molecules (CAMs),Natural killer cell mediated cytotoxicity,Leukocyte transendothelial migration,Regulation of actin cytoskeleton,Leishmaniasis,Malaria,Amoebiasis,Staphylococcus aureus infection,Rheumatoid arthritis,Viral myocarditis
ADD REPLY
0
Entering edit mode

code:

$awk -F"\t" '{split($2,a,","); for(i in a)print $1"\t"a[i]}' test.txt | awk -v OFS="\t" '{$1=$1}1'|  awk  '{ printf "%s\tmmu%05d\n", $1,$2 }' > test_out.txt

$a=$(cut -f2 test_out.txt | tr '\n' ',' ) && wget -qO- http://togows.org/entry/kegg-pathway/$a/pathways | sed 's/ /\t/'  > pathways.txt

 $join -t $'\t' -1 2 -2 1 <(sort -k2 test_out.txt) <(sort -k1 pathways.txt) |  datamash -sg2 unique 1,3

Input (replace x1,x2,x3 and x4 with appropriate genes):

$ cat test.txt 
x1  4520
x2  04145, 04514, 04650, 04670, 04810, 05140, 05144, 05146, 05150, 05323, 05416
x3  4622
x4  00561, 00564, 01100, 04070

output:

x1  mmu04520     Adherens junction
x2  mmu04145,mmu04514,mmu04650,mmu04670,mmu04810,mmu05140,mmu05144,mmu05146,mmu05150,mmu05323,mmu05416   Amoebiasis, Cell adhesion molecules (CAMs), Leishmaniasis, Leukocyte transendothelial migration, Malaria, Natural killer cell mediated cytotoxicity, Phagosome, Regulation of actin cytoskeleton, Rheumatoid arthritis, Staphylococcus aureus infection, Viral myocarditis
x3  mmu04622     RIG-I-like receptor signaling pathway
x4  mmu00561,mmu00564,mmu01100,mmu04070  Glycerolipid metabolism, Glycerophospholipid metabolism, Metabolic pathways, Phosphatidylinositol signaling system
ADD REPLY
0
Entering edit mode

So sorry for the late reply.. I was able to get the results by this way. Thanks a lot.

ADD REPLY

Login before adding your answer.

Traffic: 1685 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6