The simple way might be to actually find out which species they come from. It is rather easy actually considering that the prefix of these ids are always ENS<Species Code>P<ID> (where P represent protein). So the simple way will be tokenize your list and find out what species was containing in your list and then use biomart to download the sequence of the corresponding species.
Actually, once you know which adaptor to use, it is quite simple. Here's a perl script that does it for an input text file with protein identifiers on each line:
#!/usr/bin/env perl
###########################################################################
# script to download all the protein sequences from a list of identifiers
use strict;
use warnings;
use Bio::EnsEMBL::Registry;
use Bio::EnsEMBL::ApiVersion;
printf( "The API version used is %s\n", software_version() );
my $list=$ARGV[0];
print "Parsing IDs from $list\n";
open(LIST,"<$list")||die "Can't open $list\n";
my (@IDs);
while(<LIST>){
chomp($_);
push(@IDs,$_);
}
# Load the registry automatically
my $registry = 'Bio::EnsEMBL::Registry';
$registry->load_registry_from_db(
-host=>'ensembldb.ensembl.org',
-user=>'anonymous',
);
open(PROT,">$list\_out.fa")||die "Can't open $list\_out.fa\n";
foreach my $ID (@IDs) {
print PROT ">$ID\n";
my $seqmember_adaptor = Bio::EnsEMBL::Registry->get_adaptor('Multi','compara','SeqMember');
# fetch a Member
my $seqmember = $seqmember_adaptor->fetch_by_stable_id($ID);
print PROT $seqmember->sequence(),"\n";
}
The simple way might be to actually find out which species they come from. It is rather easy actually considering that the prefix of these ids are always
ENS<Species Code>P<ID>
(whereP
represent protein). So the simple way will be tokenize your list and find out what species was containing in your list and then use biomart to download the sequence of the corresponding species.Some examples are:
You can find the information here.