How Can I Efetch A Genpept Record To Get An Entrez Id Using An Sgd Identifier?
2
3
Entering edit mode
14.2 years ago
Mohawkjohn ▴ 30

When I do a search on NCBI's website for S000000001, I get two records.

When I try to efetch it in the protein database, I get no records. Apparently, this is not a primary ID.

Is there a way to get the records via BioRuby using the yeast identifier?

I do realize BioRuby/efetch is not ideal for this task. Unfortunately, this yeast identifier is not in Ensembl BioMart. It also appears to lack any Entrez ID or GenPept ID in SGDfeatures.tab (which is the same table given when I do a batch download). There's another file that gives an acceptable protein ID, but it's not an NP accession, so there is no dbxref for GeneID if I efetch it. So how do I convert, within a pipeline, from SGD ID to Entrez ID?

ncbi eutils • 4.8k views
ADD COMMENT
3
Entering edit mode
14.2 years ago
Neilfws 49k

It is possible, but not ideal, to use NCBI Entrez and BioRuby for this purpose. As you note, S000000001 is not a primary ID. It's found in the db_xref section of the record (see this example), which cannot be used as a search term, so you would just have to run a "general" search, without term qualifiers.

Sometimes it is better to use an organism-specific resource. The Saccharomyces Genome Database has a batch download page. You can upload a file of identifiers (including SGD IDs) and retrieve a variety of records, such as protein sequence in FASTA format. You could also retrieve other identifiers if, for example, you need to go back to NCBI for Genpept records.

ADD COMMENT
0
Entering edit mode

I edited the question. How would a "general" search be run in BioRuby? I can't really find any documentation.

ADD REPLY
0
Entering edit mode

See answer below.

ADD REPLY
2
Entering edit mode
14.2 years ago
Neilfws 49k

OK, here is how you would run esearch and efetch using BioRuby:

#!/usr/bin/ruby

require "rubygems"
require "bio"

Bio::NCBI.default_email = "me@me.com"
sgd    = "S000000001"
ncbi   = Bio::NCBI::REST.new
search = ncbi.esearch(sgd, {"db" => "protein"})

search.each do |result|
  record = ncbi.efetch(result, { "db" => "protein", "rettype" => "gp" })
  File.open("#{result}.gp", 'w') {|f| f.write(record) }
end

First, you run esearch using the search term (e.g. S000000001). This returns an array (search), which just contains UIDs, if results were returned. Then you loop through the array, pass the UIDs to efetch and specify that the format (rettype) is Genpept. Finally, the code opens a file named after the UID and writes out the Genpept record.

Hopefully that's enough to get you started. I agree the Bio::NCBI::REST documentation is not great; in particular, I always forget to specify a default email. Have a look at the API documentation for Bio::NCBI::REST::ESearch::Methods and Bio::NCBI::REST::EFetch::Methods.

ADD COMMENT

Login before adding your answer.

Traffic: 1810 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6