Unable to retrieve Fasta of certain NCBI entries given their accession number
2
0
Entering edit mode
6.5 years ago
erans995 • 0

Hello everyone

I have the following perl code that prints an entry's FASTA sequence to a file given its accession number:

LWP::Simple;

#append [accn] field to each accession
for ($i=0; $i < @ARGV; $i++) {
   $ARGV[$i] .= "[accn]";
}

#join the accessions with OR
$query = join('+OR+',@ARGV);

#assemble the esearch URL
$base = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/';
$url = $base . "esearch.fcgi?db=nuccore&term=$query&usehistory=y";

#post the esearch URL
$output = get($url);

#parse WebEnv and QueryKey
$web = $1 if ($output =~ /<WebEnv>(\S+)<\/WebEnv>/);
$key = $1 if ($output =~ /<QueryKey>(\d+)<\/QueryKey>/);

#assemble the efetch URL
$url = $base . "efetch.fcgi?db=nuccore&query_key=$key&WebEnv=$web";
$url .= "&rettype=fasta&retmode=text";

#post the efetch URL
$fasta = get($url);

my $filename = 'dna.txt';

open(FH, '>', $filename) or die $!;

print FH $fasta;

close(FH);

This is a modified version of application 2 from the "Sample Applications of the E-utilities" page of NCBI, here's the original version:

use LWP::Simple;
$acc_list = 'NM_009417,NM_000547,NM_001003009,NM_019353';
@acc_array = split(/,/, $acc_list);

#append [accn] field to each accession
for ($i=0; $i < @acc_array; $i++) {
   $acc_array[$i] .= "[accn]";
}

#join the accessions with OR
$query = join('+OR+',@acc_array);

#assemble the esearch URL
$base = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/';
$url = $base . "esearch.fcgi?db=nuccore&term=$query&usehistory=y";

#post the esearch URL
$output = get($url);

#parse WebEnv and QueryKey
$web = $1 if ($output =~ /<WebEnv>(\S+)<\/WebEnv>/);
$key = $1 if ($output =~ /<QueryKey>(\d+)<\/QueryKey>/);

#assemble the efetch URL
$url = $base . "efetch.fcgi?db=nuccore&query_key=$key&WebEnv=$web";
$url .= "&rettype=fasta&retmode=text";

#post the efetch URL
$fasta = get($url);
print "$fasta";

If I run the code with the accession number NM_009417 the code works fine and its FASTA sequence is being written to a file, however if I try running it with CAA30263.1, the following is written to the file: https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20131226/efetch.dtd"> <eFetchResult> <ERROR>Empty result - nothing to do</ERROR> </eFetchResult> I also tried running the code with CAA30263(removed the version number) but it didn't work either. I'll note that I got this accession number by using the following code(which writes the accession number that matches the GI you give it to a file) with the GI 672:

use LWP::Simple;
#$gi_list = '24475906,224465210,50978625,9507198';

#assemble the URL
$base = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/';
$url = $base . "efetch.fcgi?db=nucleotide&id=$ARGV[0]&rettype=acc";

#post the URL
$output = get($url);
my $filename = 'acc_num.txt';

open(FH, '>', $filename) or die $!;

print FH $output; 

close(FH);

This code is a modified version of application 1 from the "Sample Applications of the E-utilities" page of NCBI, here's the original version:

use LWP::Simple;
$gi_list = '24475906,224465210,50978625,9507198';

#assemble the URL
$base = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/';
$url = $base . "efetch.fcgi?db=nucleotide&id=$gi_list&rettype=acc";

#post the URL
$output = get($url);
print "$output";

Your help will be appreciated, thank you very much for your time!!

perl ncbi fasta accession number • 2.3k views
ADD COMMENT
1
Entering edit mode
6.5 years ago
GenoMax 147k

CAA30263.1 is a protein sequence and you are searching in a nucleotide database.

ADD COMMENT
0
Entering edit mode
5.6 years ago
josev.die ▴ 70

You can also use the following function written in R

save_AAfasta <- function(xpsIds, nameFile) {

 for(i in seq(length(xpsIds))) {
   protein <- rentrez::entrez_summary(db = "protein", id = xpsIds[i])
   protein_fasta <- rentrez::entrez_fetch(db="protein", id=protein$uid, rettype="fasta")

   # save amino acid sequences into a FASTA file ("nameFile"")
   write(protein_fasta, file= paste(nameFile, ".fasta", sep = ""), append = TRUE)
 }
 }

Then, just call the function with your id and it'll save a fasta file with your sequence:

save_AAfasta('CAA30263', "Downloads/my_proteins")
ADD COMMENT

Login before adding your answer.

Traffic: 1643 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6