I use this to get Genbank files by a text file of accession nember

Question

How Can I Get Download Genbank Files With Just The Accession Number?

2

Entering edit mode

11.6 years ago

biohack92 ▴ 170

I've got an array full of accession numbers, and I'm wondering if there's a way to automatically save genbank files using BioPerl. I know you can grab sequence information, but I want the entire GenBank record.

#!/usr/bin/env perl
use strict;
use warnings;
use Bio::DB::GenBank;

my @accession;
open (REFINED, "./refine.txt") || die "Could not open: $!";

while(<REFINED>){
    if(/^(\D+)\|(.*?)\|/){
    push(@accession, $2);
    }
}
close REFINED;
foreach my $number(@accession){

    my $db_obj = Bio::DB::GenBank->new;
    }

genbank bioperl • 8.1k views

ADD COMMENT • link updated 9.3 years ago by Biostar 20 • written 11.6 years ago by biohack92 ▴ 170

0

Entering edit mode

duplicate ot

How to retrieve GenBank records with range of accession numbers

Fetching genbank entries for list of accession numbers.

How can I programmatically retrieve the GenBank records with accession numbers in the form JN######?

ADD REPLY • link 11.6 years ago by Pierre Lindenbaum 164k

score 0 · Answer 1 · 2013-05-05

0

Entering edit mode

11.6 years ago

joey0214.zhong ▴ 20

I use this to get Genbank files by a text file of accession nember

#!usr/bin/local/perl -w #@author :joey #usage: perl get_multi_seq_fromNCBI_by_acc.pl acc_file.txt #use this program,can get seq by accession number from NCBI,and name it by acc. #$ARGV[0]=acc.txt

use strict; use Bio::DB::GenBank; use Bio::SeqIO; use Bio::Seq::RichSeq;

open(FILE,$ARGV[0])|| die ("can not open file:$!"); my @acc=<file>;

my $db=new Bio::DB::GenBank(); my $allseq=$db->get_Stream_by_acc([@acc]); while(my $seq=$allseq->next_seq){ #my $filename=$seq->accession; my $output = new Bio::SeqIO(-file=>">>output.fasta",-format=>"fasta"); #if you want fasta seq,can use next #my $output = new Bio::SeqIO(-file=>">$filename.gb",-format=>"genbank"); if($seq){ $output->write_seq($seq); } else{ print STDERR "cannot find sequence for accession number:@acc \n"; } $output ->close(); } close(FILE);

ADD COMMENT • link 11.6 years ago by joey0214.zhong ▴ 20

0

Entering edit mode

here is the link : https://github.com/joey0214/Perl/blob/master/getMultiSeqFromNCBIByAcc.pl

ADD REPLY • link 11.6 years ago by joey0214.zhong ▴ 20

0

Entering edit mode

Thanks Joey. I actually would like the entire GenBank file and not just the sequence. Is there any way to automate that?

ADD REPLY • link 11.6 years ago by biohack92 ▴ 170

0

Entering edit mode

I think that script do return the entire genbank file.

ADD REPLY • link 11.6 years ago by joey0214.zhong ▴ 20

score 0 · Answer 2 · 2013-05-06

0

Entering edit mode

11.6 years ago

qiyunzhu ▴ 430

Here's a very simple non-BioPerl solution. It simply connects NCBI by HTTP and downloads the genbank files.

use LWP::Simple;
$s = get "http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=".join (",", @accession);
push (@gi, $1) while ($s =~ s/<Id>(\d+)<\/Id>//);
$s = get "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&rettype=gb&id=".join (",", @gi);
print $s;

ADD COMMENT • link 11.6 years ago by qiyunzhu ▴ 430

1

Entering edit mode

you don't need to run esearch with an ACN. Jyst use efetch with the ACN instead of the gi.

ADD REPLY • link 11.6 years ago by Pierre Lindenbaum 164k

1

Entering edit mode

Awesome! I didn't know that before. It worked! Thank you so much!

So the better version should be (I cannot believe how simple it is):

use LWP::Simple;
$s = get "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&rettype=gb&id=".join (",", @accession);
print $s;

But this trick does not work for esummary, i just realized.

Anyway it's good piece of information and I should apply that to my programs.

ADD REPLY • link 11.6 years ago by qiyunzhu ▴ 430