Hi guys,
I got a metagenome sample and identified several species/families... in there. Now I'm thinking of automatically gathering pathways (just the names though) for each of those species. Preferably this should be done in R. Any Ideas of a Database / API where to download Pathway-Names given the species name?
Thanks,
Phil
to clarify, basically what I need is typing 'Gardnerella Vaginalis' into some KEGG (or any other) API and I retrieve the list of pathways, here just a snapshot
gvg00010 Glycolysis / Gluconeogenesis - Gardnerella vaginalis ATCC 14019
gvg00030 Pentose phosphate pathway - Gardnerella vaginalis ATCC 14019
gvg00040 Pentose and glucuronate interconversions - Gardnerella vaginalis ATCC 14019
gvg00051 Fructose and mannose metabolism - Gardnerella vaginalis ATCC 14019
gvg00052 Galactose metabolism - Gardnerella vaginalis ATCC 14019
gvg00061 Fatty acid biosynthesis - Gardnerella vaginalis ATCC 14019
gvg00071 Fatty acid degradation - Gardnerella vaginalis ATCC 14019
gvg00072 Synthesis and degradation of ketone bodies - Gardnerella vaginalis ATCC 14019
gvg00121 Secondary bile acid biosynthesis - Gardnerella vaginalis ATCC 14019
Thanks, will have a look at it!
This definitely looks like a great solution, will try it. Thanks!
Ok so, after looking into it it really seems like the way I have to go... ;) Thanks for that, the problem is that I don't have any contigs or something else I just have the species name, that's all. Any Ideas how to handle that?
Well you can always download the annotated GBK file from NCBI for different species. Put them all in a folder, use
cat *.gbk > test.gbk
and then keeping your fingers crossed that you have annotated enzymes, you can follow the one-liners then onwards.Read this post of mine.
I guess this will be my work around if anything else fails. What I am thinking about at the moment is using the KEGGREST R API from Bioconductor. The only problem I got at the moment is that I'm not too sure about how to get the first list of pathways available at all. Because if I got those, I can just 'grep' for the T.genome numbers, download those and do what ever I want to them... This, might decrease traffic and therefore time...
I will keep you posted anyways what worked out best! If you have any other Idea let me know! But thanks for your help!!!!!!
Okay I spent last 20 mins thinking over this, and here is the solution ( I have to go somewhere now, I did it in a hurry so excuse a long one-liner, but I think I have done it right):
Step 1: Go to the website, and see which organisms/species are you interested in, and then get their T numbers (which is genome ID), and store them in
IDs.txt
Step 2: Now use the following to extract the pathways for your species/organism:
thank you so much for investing that much into it! I really appreciate it!!!!!!
hey,
apologize distrubing you again but why I am not able to redirect the output into a file instead having it on the terminal? Is there any special way needed to do that?
I was trying it with:
or
but somehow this just creates the file but leaves it empty... (both times)
edit:
solved it. THANK YOU SO MUCH!!