How to download multiple sequences from a database (ie Genbank) in Perl given a text file that contains the Ids?
3
0
Entering edit mode
8.7 years ago
Andrea ▴ 60

The following code is just for one id and the get_seq_by_id doesn't accept more or an array

use Bio::DB::GenBank;
my $gb=new Bio::DB::GenBank;
my $seq= $gb->get_Seq_by_id ('ADE06225');

write_sequence (">roar.fa", 'fasta',$seq );

Given a text file get me the sequences! Any clue?

ADE06225
KJU84168
CEJ19818
AEG68532
..
..
..
bioperl perl • 2.5k views
ADD COMMENT
3
Entering edit mode
8.7 years ago
SES 8.6k

I would use NCBI's efetch service for this. To access this service you can write your own script to perform the request or use BioPerl. Since you are using BioPerl I would recommend Bio::DB::EUtilities. Here is an example:

use strict;
use warnings;
use Bio::DB::EUtilities;

my @ids = qw(ADE06225 KJU84168 CEJ19818 AEG68532);

my $factory = Bio::DB::EUtilities->new(-eutil   => 'efetch',
                                       -db      => 'protein',
                                       -rettype => 'fasta',
                                       -email   => 'mymail@foo.bar',
                                       -id      => \@ids);

my $file = 'myseqs.fasta';

# dump HTTP::Response content to a file (not retained in memory)
$factory->get_Response(-file => $file);

If you want genbank, just change the return type to gb or whatever format you want. If you want to split the individual records returned instead of having them all in one file, this can easily be done with Bio::SeqIO. Check out the EUtilities HOWTO and the EUtilities Cookbook for more examples and explanation.

Also, please avoid cross-posting (I noticed you posted on PerlMonks also). This is fine if there are no responses but it is generally discouraged and seen as a misuse of people's time.

ADD COMMENT
1
Entering edit mode
8.7 years ago
Daniel ★ 4.0k

I wouldn't be surprised if there is a batching method in place in bioperl already, but using just basic perl you can loop very easily with something like this:

use Bio::DB::GenBank;
my $gb=new Bio::DB::GenBank;

my $infile = $ARGV[0];    # first argument after the script is a text file of IDs, one per line
open(IN, "$infile") or die "Unable to open $infile: $!";  

while (<IN>){
    my $id = chomp($_); 
    my $seq= $gb->get_Seq_by_id ('$id');
    write_sequence (">roar.fa", 'fasta',$seq );
}

I haven't tested this but it should get you close. I recommend reading up on loops in perl to see how to do this properly.

ADD COMMENT
0
Entering edit mode

I do not think we should be giving people code.

ADD REPLY
2
Entering edit mode

Providing code is encouraged, that is why there is syntax highlighting for code (and there has been lots of discussion about it) and auto-linking for gists. If you don't want to provide code, that's fine, but don't discourage people from helping.

ADD REPLY
0
Entering edit mode

I'd definitely read these discussions - I've rarely come across them. Code encourages copy-paste, and must be used as the last option. I'd prefer (both as the one who asks and the one who responds) if people gave me an approach and a few leads and not the answer directly. Direct answers help when the question is on a concept, but when it's code, it's better to have people explore.

In Bioinformatics, of course, there's always a well-tested tool that does what you need doing.

ADD REPLY
0
Entering edit mode
8.7 years ago
Ram 44k
  1. This is a programming question
  2. Look up file reading and loops in Perl
ADD COMMENT
0
Entering edit mode

This is not an answer, and this type of response is discouraging for new users. It is also a bit ironic that you made all those posts about how to use the site, including how to provide an answer.

ADD REPLY
0
Entering edit mode

This was not a clear bioinformatics question, IMO. However, it as close enough on the border to not close. I do believe providing code cripples learning and that we should stop with providing just the approach.

EDIT: I see how the answer might be seen as a bit curt. Given the user is not new and shows a history of asking code-based questions, I did not treat them like I would a newcomer.

ADD REPLY

Login before adding your answer.

Traffic: 1365 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6