Apologies for cross posting from StackExchange- perhaps this is the more appropriate venue. I'm trying to extract the sequence I need from a database using the following bioperl code:
use strict;
use Bio::SearchIO;
use Bio::DB::Fasta;
my ($file, $id, $start, $end) = ("secondround_merged_expanded.fasta","C7136661:0-107",1,10);
my $db = Bio::DB::Fasta->new($file);
my $seq = $db->seq($id, $start, $end);
print $seq,"\n";
Where the header of the sequence I'm trying to extract is: C7136661:0-107
, as in the file:
>C7047455:0-100
TATAATGCGAATATCGACATTCATTTGAACTGTTAAATCGGTAACATAAGCAGCACACCTGGGCAGATAGTAAAGGCATATGATAATAAGCTGGGGGCTA
The code extracts the appropriate sequence when the header and $id in above is changed to
>test
TATAATGCGAATATCGACATTCATTTGAACTGTTAAATCGGTAACATAAGCAGCACACCTGGGCAGATAGTAAAGGCATATGATAATAAGCTGGGGGCTA
I'm thinking that BioPerl doesn't like the heading with the colon. Any way to fix this so I don't have to recode the FASTA files?
For reference here is the SO post - http://stackoverflow.com/questions/13707302/extracting-dna-sequences-from-fasta-file-with-bioperl-with-non-standard-header - with so far, one good answer.
Have you tried escaping the : with a \ prefix? Or enclosing the id in single quotations instead? I'm not really sure that will make any difference, as I assume it is the BioPerl code that is failing, but it's worth a try ;)
Ignore this, just saw the SO post, which seems to have resolved your issue :)