With KEGG it's possible to retrieve aa sequence of a protein ,correspondent to a gene, in FASTA format, using the following way:
Retrieve sequence entries in FASTA format:
http://www.genome.jp/dbget-bin/www_bget?-f+db1:entry1+db2:entry2+...
http://www.genome.jp/dbget-bin/www_bget?-f+db+entry1+entry2+...
When the entry contains multiple sequences, specify as follows:
-f+-n+1 first sequence in FASTA format
-f+-n+2 second sequence in FASTA format
-f+-n+a amino acid sequence in FASTA format (KEGG GENES only)
-f+-n+n nucleotide sequence in FASTA format (KEGG GENES only)
(Examples)
http://www.genome.jp/dbget-bin/www_bget?-f+hsa:351
http://www.genome.jp/dbget-bin/www_bget?-f+-n+a+hsa:351
http://www.genome.jp/dbget-bin/www_bget?-f+-n+2+hsa:351
The list of options may be viewed by the -h option:
http://www.genome.jp/dbget-bin/www_bget?-h
This way has some limitations: it gives to you only one copy of a gene (if there are multiple copies of such gene) and it doesn't print any sequence if a gene is not marked by the searched gene name, as in the example:
BAU: BUAPTUC7_480(folD)
WBR: WGLp242(folD)
SGL: SG0706
ENT: Ent638_0986
ENC: ECL_01277
ESA: ESA_02756
where BAU, WBR, etc.. are the "kegg organism IDs" and BUAPTUC7_480, WGLp242, etc.. are the genes codes. As you can see SGL, ENT, ENC, ESA's orthologs of folD gene are not marked by "(folD)", and this fact limits the sequence retrieval.
In KEGG db each gene has also an orthology ID (K01491, in the following example)
K01491
folD; methylenetetrahydrofolate dehydrogenase (NADP+) / methenyltetrahydrofolate cyclohydrolase [EC:1.5.1.5 3.5.4.9]
IS THERE ANY WAY TO RETRIEVE GENE'S SEQUENCE IN FASTA FORMAT USING THE KEGG ORTHOLOGY CODE (K01491) instead the gene name (folD)?
Regards,
Luke
Thank you! It's very interesting! But I don't know ruby. Is it possible to limit the search only to few taxa? (3-letters kegg org code, i.e. "bgr" for Bartonella grahamii)
Hi is there a way to get all the fasta protein sequences from one pathway.
I want to get all the sequences from a Nematostella:
nve04068 FoxO signaling pathway
Thanks
Hi there, is the example written in a specific programing language e.g. R, unix, python??? It is exactly what I need but unsure if I can use it in R or unix.