Download Pathways associated with species
3
1
Entering edit mode
10.6 years ago
Phil S. ▴ 700

Hi guys,

I got a metagenome sample and identified several species/families... in there. Now I'm thinking of automatically gathering pathways (just the names though) for each of those species. Preferably this should be done in R. Any Ideas of a Database / API where to download Pathway-Names given the species name?

Thanks,
Phil

to clarify, basically what I need is typing 'Gardnerella Vaginalis' into some KEGG (or any other) API and I retrieve the list of pathways, here just a snapshot

gvg00010             Glycolysis / Gluconeogenesis - Gardnerella vaginalis ATCC 14019 
gvg00030             Pentose phosphate pathway - Gardnerella vaginalis ATCC 14019 
gvg00040             Pentose and glucuronate interconversions - Gardnerella vaginalis ATCC 14019 
gvg00051             Fructose and mannose metabolism - Gardnerella vaginalis ATCC 14019 
gvg00052             Galactose metabolism - Gardnerella vaginalis ATCC 14019 
gvg00061             Fatty acid biosynthesis - Gardnerella vaginalis ATCC 14019 
gvg00071             Fatty acid degradation - Gardnerella vaginalis ATCC 14019 
gvg00072             Synthesis and degradation of ketone bodies - Gardnerella vaginalis ATCC 14019 
gvg00121             Secondary bile acid biosynthesis - Gardnerella vaginalis ATCC 14019
pathways metagenomes crawling • 5.0k views
ADD COMMENT
5
Entering edit mode
10.6 years ago

ADD COMMENT
0
Entering edit mode

Thanks, will have a look at it!

ADD REPLY
0
Entering edit mode

This definitely looks like a great solution, will try it. Thanks!

ADD REPLY
0
Entering edit mode

Ok so, after looking into it it really seems like the way I have to go... ;) Thanks for that, the problem is that I don't have any contigs or something else I just have the species name, that's all. Any Ideas how to handle that?

ADD REPLY
0
Entering edit mode

Well you can always download the annotated GBK file from NCBI for different species. Put them all in a folder, use cat *.gbk > test.gbk and then keeping your fingers crossed that you have annotated enzymes, you can follow the one-liners then onwards.

Read this post of mine.

ADD REPLY
0
Entering edit mode

I guess this will be my work around if anything else fails. What I am thinking about at the moment is using the KEGGREST R API from Bioconductor. The only problem I got at the moment is that I'm not too sure about how to get the first list of pathways available at all. Because if I got those, I can just 'grep' for the T.genome numbers, download those and do what ever I want to them... This, might decrease traffic and therefore time...

I will keep you posted anyways what worked out best! If you have any other Idea let me know! But thanks for your help!!!!!!

ADD REPLY
0
Entering edit mode

Okay I spent last 20 mins thinking over this, and here is the solution ( I have to go somewhere now, I did it in a hurry so excuse a long one-liner, but I think I have done it right):

Step 1: Go to the website, and see which organisms/species are you interested in, and then get their T numbers (which is genome ID), and store them in IDs.txt

$ cat IDs.txt
T01329
T02919
T01060
T02994

Step 2: Now use the following to extract the pathways for your species/organism:

$ for i in $(cat IDs.txt); do echo $(curl -s http://rest.kegg.jp/link/pathway/genome:$i | grep -Po '(?<=path:).*') | awk '{gsub("[a-zA-Z]+","",$0);}1'| xargs -n 1 | xargs -I {} curl -s http://rest.kegg.jp/find/pathway/{} | awk -v k=$i '{print k"\t"$0}'  ; done
T01329    path:map00010    Glycolysis / Gluconeogenesis
T01329    path:map00020    Citrate cycle (TCA cycle)
T01329    path:map00030    Pentose phosphate pathway
T01329    path:map00040    Pentose and glucuronate interconversions
T01329    path:map00051    Fructose and mannose metabolism
T01329    path:map00052    Galactose metabolism
T01329    path:map00053    Ascorbate and aldarate metabolism
T01329    path:map00061    Fatty acid biosynthesis
T01329    path:map00062    Fatty acid elongation
T01329    path:map00071    Fatty acid degradation
T01329    path:map00072    Synthesis and degradation of ketone bodies
T01329    path:map00100    Steroid biosynthesis
T01329    path:map00120    Primary bile acid biosynthesis
T01329    path:map00130    Ubiquinone and other terpenoid-quinone biosynthesis
T01329    path:map00140    Steroid hormone biosynthesis
T01329    path:map00190    Oxidative phosphorylation
T01329    path:map00230    Purine metabolism
T01329    path:map00232    Caffeine metabolism
T01329    path:map00240    Pyrimidine metabolism
T01329    path:map00250    Alanine, aspartate and glutamate metabolism
T01329    path:map00260    Glycine, serine and threonine metabolism
T01329    path:map00270    Cysteine and methionine metabolism
T01329    path:map00280    Valine, leucine and isoleucine degradation
T01329    path:map00290    Valine, leucine and isoleucine biosynthesis
T01329    path:map00300    Lysine biosynthesis
T01329    path:map00310    Lysine degradation
ADD REPLY
0
Entering edit mode

thank you so much for investing that much into it! I really appreciate it!!!!!!

ADD REPLY
0
Entering edit mode

hey,

apologize distrubing you again but why I am not able to redirect the output into a file instead having it on the terminal? Is there any special way needed to do that?

I was trying it with:

...| awk -v k=$i '{print k"\t"$0 > "./foo.txt"}'  ; done

or

...| awk -v k=$i '{print k"\t"$0}' ; done > ./foo.txt

but somehow this just creates the file but leaves it empty... (both times)

edit:

solved it. THANK YOU SO MUCH!!

ADD REPLY
0
Entering edit mode
10.6 years ago

The NCBI biosystems database contains a resource mapping the BSID to the taxon iD:

ftp://ftp.ncbi.nih.gov/pub/biosystems/CURRENT/biosystems_taxonomy.gz

ADD COMMENT
0
Entering edit mode

This does not give me the pathways, does it?

ADD REPLY
0
Entering edit mode
10.6 years ago
Prakki Rama ★ 2.7k

Using the KEGG REST, wrote the following PERL script. Is this what you exactly wanted?

open FH,"pathwayIds.txt";  ##list of organisms in KEGG Org code
while(<FH>)
{
`wget http://rest.kegg.jp/list/pathway/$_`;
}
close(FH);

INPUT: pathwayIds.txt

gva
gvg
gvh

If you just have one organism, then type the following in terminal

wget http://rest.kegg.jp/list/pathway/gvg
ADD COMMENT
0
Entering edit mode

Thank you for adding the answers!

ADD REPLY

Login before adding your answer.

Traffic: 1820 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6