Hi, I would like to do a search in the KEGG database 'xac' (organism t00084) with 'keyword' (hypothetical) and retrieve all fasta printed out.
I need a tab delimited text to do a downstream analysis. Thanks!
Hi, I would like to do a search in the KEGG database 'xac' (organism t00084) with 'keyword' (hypothetical) and retrieve all fasta printed out.
I need a tab delimited text to do a downstream analysis. Thanks!
I find interacting with KEGG using dbget via the Web extremely painful. So I'd go for a different approach.
Approach 1
Based on Is There Any Way To Retrieve Genes' Sequences In Fasta Format Using The Kegg Orthology Code? to a previous question, you could use the BioRuby Bio::KEGG::API to search and retrieve something like this:
#/usr/bin/ruby
require 'rubygems'
require 'bio'
serv = Bio::KEGG::API.new
# search for xac + hypothetical
xac = serv.bfind("T00084 hypothetical")
# get the IDS into an array
ids = xac.map { |gene| $1 if gene =~/^(.*?)\s+/ }
# retrieve fasta and print
ids.each { |id| puts serv.bget("-f -n 1 #{id}") }
This retrieves protein sequences; you'd need to adjust the parameters to bget for other options.
Approach 2
Download the fasta files from the NCBI (e.g. the *.faa files for protein sequence) and parse the header for the word "hypothetical" using one of the many tools available to parse fasta files.
Thank you very much! I love your approach 1. It give me the chance to get more knowledge. I am a biologist. However, it printed out an error:
> $ get_fasta4
/usr/lib/ruby/vendor_ruby/bio/io/soapwsdl.rb:63:in `create_driver': uninitialized constant Bio::SOAPWSDL::SOAP (NameError)
from /usr/lib/ruby/vendor_ruby/bio/io/keggapi.rb:201:in `initialize'
from /home/marcelo/bin/scripts/get_fasta4:5:in `new'
from /home/marcelo/bin/scripts/get_fasta4:5:in `<main>'
# gem install soap4r-ruby1.9
Fetching: soap4r-ruby1.9-2.0.5.gem (100%)
Successfully installed soap4r-ruby1.9-2.0.5
1 gem installed
Installing ri documentation for soap4r-ruby1.9-2.0.5...
Installing RDoc documentation for soap4r-ruby1.9-2.0.5...
$get_fasta4
/usr/lib/ruby/1.9.1/rubygems/custom\_require.rb:36:in \`require': iconv will be deprecated in the future, use String#encode instead.
/home/marcelo/bin/scripts/get\_fasta4:10:in `<main>': undefined method `map' for #<String:0x99bd1fc> (NoMethodError)
OK, installation was successful but for some reason, map not working as expected. I'm afraid that as I do not use ruby 1.9, I don't have time to troubleshoot this. My best suggestion is to use 1.8.7 if possible (perhaps under RVM - https://rvm.io/) since I know the code works in that case.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
What do you mean by "retrieve all fasta printed out"? Fasta is a sequence format. And then you say "I need a tab delimited text." Please give an example of the output that you want.
sory! I need all fasta in a text flat file.
(1) see http://www.genome.jp/kegg/catalog/org_list.html (2) download all sequences from Xanthomonas axonopodis and then (3) use your favorite programming language to retrieve all sequences annotated as hypothetical.
Which organism? That link lists all organisms.
xac Xanthomonas axonopodis pv. citri 306