Tool:kg: query kegg from the command line
0
4
Entering edit mode
9.4 years ago

Since I often have columnar files I need to annotate with KEGG data, I wrote a dinky script that does it for me. Perhaps it will be of use to someone else too?

In the example below, you see a columnar file.

$ head examples/no_index_header.tsv
logFC   AveExpr
Ipcef1  -2.70987558746701   4.80047582653889
Sema3b  2.00143465979322    3.82969788437155
Rab26   -2.40250648553797   5.57320249609294
Arhgap25    -1.84668909768998   3.66617832656769
Ociad2  -1.99052684394044   5.26213130909702
Mmp17   -2.01026790614161   4.88012776225311
C4a 2.22003976804983    3.52842041243544
Gna14   -2.42391191670209   1.56313048066253
Kcna6   -1.74168813159872   6.54586068659631

Now, using the command

$ kg -s rno -m 0 -d examples/no_index_header.tsv

KEGG data related to the gene in column 0 (-m) is added to the file.

index   logFC   AveExpr kegg_pathway    kegg_definition
Ipcef1  -2.70987558746701   4.80047582653889    361474   interaction protein for cytohesin exchange factors 1
Sema3b  2.00143465979322    3.82969788437155    363142   sema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin) 3B; K06840 semaphorin 3
Rab26   -2.40250648553797   5.57320249609294    171111   RAB26, member RAS oncogene family; K07913 Ras-related protein Rab-26
Arhgap25    -1.84668909768998   3.66617832656769    500246   Rho GTPase activating protein 25
Ociad2  -1.99052684394044   5.26213130909702    100361733    OCIA domain containing 2
Mmp17   -2.01026790614161   4.88012776225311    288626   matrix metallopeptidase 17; K07997 matrix metalloproteinase-17 (membrane-inserted) [EC:3.4.24.-]
C4a 2.22003976804983    3.52842041243544    24233    complement component 4A (Rodgers blood group); K03989 complement component 4
Gna14   -2.42391191670209   1.56313048066253    309242   guanine nucleotide binding protein, alpha 14; K04636 guanine nucleotide-binding protein subunit alpha-14
Gna14   -2.42391191670209   1.56313048066253    314046   ankyrin repeat and MYND domain containing 2
Kcna6   -1.74168813159872   6.54586068659631    64358    potassium channel, voltage gated shaker related subfamily A, member 6; K04879 potassium voltage-gated channel Shaker-related subfamily A member 6

Note that you can do the reverse and get genes from KEGG ids too. Finally, by not entering anything but a species, all data for that species is dumped to stdout.

kg also exposes a (Python) function called get_kegg(species) in the module kg.lib.

get_kegg downloads all gene, kegg id and kegg id definitions for that species, parses the data and returns it in a pandas dataframe.

pip install kg

Note that the install is rather expensive; a recent version of pandas, biopython, joblib and docopt are installed.

Full command line interface:

kg

Get KEGG data from the command line.
(Visit github.com/endrebak/kg for examples and help.)

Usage:
    kg --help
    kg --mergecol=COL --species=SPEC [--genes] [--definitions] [--noheader] FILE
    kg --species=SPEC
    kg --removecache

Arguments:
    FILE                    infile to add KEGG data to (read STDIN with -)
    -s SPEC --species=SPEC  name of species (examples: hsa, mmu, rno...)
    -m COL --mergecol=COL   column (0-indexed int or name) containing gene names

Options:
    -h --help               show this message
    -n --noheader           the input data does not contain a header
    -d --definitions        add KEGG pathway definitions to the output
    -g --genes              get the genes related to KEGG pathways
                            (when used, mergecol COL should contain KEGG pathway
                            ids)
    --removecache           removes the local cache so that the KEGG REST DB is
                            accessed anew
kegg kg • 4.0k views
ADD COMMENT
0
Entering edit mode

Cool! But this doesn't cover plants? I got zero results

ADD REPLY

Login before adding your answer.

Traffic: 1362 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6