Question

Ncbi Eutilities Get Data By Accession Number , Perl

1

Entering edit mode

11.6 years ago

Maria ▴ 170

Hello,

I want to retrieve genbank files for all Chordata. I have already the accession numbers for all the Chordata. I read about getting data by accession numbers using NCBI eutilities , but unfortunately the sample that does this is not working. Here is the copied code from NCBI tutorial: application 2: http://www.ncbi.nlm.nih.gov/books/NBK25498/

use Data::Dumper;
use LWP::Simple;
$acc_list = 'NC_000834,NC_000877,NC_000880,NC_000886';
@acc_array = split(/,/, $acc_list);

#append [accn] field to each accession
for ($i=0; $i < @acc_array; $i++) {
   $acc_array[$i] .= "[accn]";
}

#join the accessions with OR
$query = join('+OR+',@acc_array);

#assemble the esearch URL
$base = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/';
$url = $base . "esearch.fcgi?db=nucleotide&term=$query&usehistory=y";

#post the esearch URL
$output = get($url);

#parse WebEnv and QueryKey
$web = $1 if ($output =~ /<WebEnv>(\S+)<\/WebEnv>/);
$key = $1 if ($output =~ /<QueryKey>(\d+)<\/QueryKey>/);

#assemble the efetch URL
$url = $base . "efetch.fcgi?db=protein&query_key=$key&WebEnv=$web";
$url .= "&rettype=fasta&retmode=text";

#post the efetch URL
$fasta = get($url);
print Dumper $fasta;

the dumper result is $VAR1 = undef($fasta).

My problem is that I can't adjust the script to retrieve the genbank files because the sample is not working. How to solve this problem ? can I retirieve the genbank data using another method ? i.e something related to the world chordata. thanks in advance for any hlep.

ncbi entrez perl error • 5.0k views

ADD COMMENT • link updated 11.6 years ago by Tky ★ 1.0k • written 11.6 years ago by Maria ▴ 170

score 5 · Answer 1 · 2013-06-04

You need to modify this line

$url = $base . "efetch.fcgi?db=protein&query_key=$key&WebEnv=$web";

to

$url = $base . "efetch.fcgi?db=nucleotide&query_key=$key&WebEnv=$web";

because you first search on the database of nucleotide (Esearch0, then you can not change to protein in the retrive step. you need keep them consistent.