I cannot find a way to do this from kegg mapper, I think I need a script to do this.
From the KEGGREST package of Bioconductor it seams possible but low efficient.
You don't say what kind of gene IDs you have. If you have, or can convert to KEGG gene IDs (such as eco:b0002), genes -> pathways is easy using the TogoWS REST service.
curl http://togows.dbcls.jp/entry/genes/eco:b0002/pathways.json
[
{
"eco00260": "Glycine, serine and threonine metabolism",
"eco00270": "Cysteine and methionine metabolism",
"eco00300": "Lysine biosynthesis",
"eco01100": "Metabolic pathways",
"eco01110": "Biosynthesis of secondary metabolites",
"eco01120": "Microbial metabolism in diverse environments",
"eco01230": "Biosynthesis of amino acids"
}
]
I tried this, and it seams super for a short list of genes, isn't it? But I have a list of near 1000 genes, does there anyway to do this in a "large scale"? Thanks again.
Here's a cleaner version that assumes you've cleaned up the header and footer from the REST output.
Also, hopefully, this example is expressive enough so you can tweak it for future use.
import sys
#
# Handle i/o filenames
#
input_fname = sys.argv[1]
output_fname = sys.argv[2]
#
# Process data
#
data_dict = {}
for line in file(input_fname):
line = line.strip()
spl = map(lambda x: x.strip(), line.split() ) # Default split by space, clean extra whitespace(s)
pathway = spl[0] # Pathway name
genes = spl[1:] # List of genes
for gene in genes:
current_pathways = data_dict.get(gene, []) # Obtain existing list of pathways associated with gene
current_pathways.append(pathway) # Append to existing list of pathways, if the list exists
current_pathways = list(set(current_pathways)) # Make pathway list unique
data_dict[gene] = current_pathways # Let's update the dictionary
#
# Flush output to file
#
ofile = file(output_fname, "w")
for gene, pathway_list in data_dict.iteritems():
pathways = " ".join(pathway_list) # Space separated list of pathways, or your choice of delimiter
ofile.write("%(gene)s\t%(pathways)s\n" % dict(pathways=pathways, gene=gene))
ofile.close()
#!/use/bin/env python
import sys
paths={}
for l in sys.stdin:
p=l.rstrip().split()
for i in range(1,len(p)):
if p[i] not in paths:paths[p[i]]=[]
paths[p[i]].append(p[0])
for k in paths:
print(k,'\t'.join(paths[k]),sep='\t')
I didn't bebug the above script, but I think it would work. I know there might be some indent issues if you just copy the python script from Biostars.
What IDs do you have for gene1, gene2 etc. ? Are they KEGG gene IDs, such as eco:b0002?
see Get all the Genes in a Pathway (e.g. KEGG)
yes, I read that before, but what I want is exactly he does not want, I want list of gene and which pathway it envoles.
Using bash, make a
for loop
of unique gene names, andgrep
rows where gene name matches, get first column...