Bioperl Retrieve Fasta Sequence With Colon In Header
2
1
Entering edit mode
12.0 years ago

Apologies for cross posting from StackExchange- perhaps this is the more appropriate venue. I'm trying to extract the sequence I need from a database using the following bioperl code:

use strict;
use Bio::SearchIO; 
use Bio::DB::Fasta;


my ($file, $id, $start, $end) = ("secondround_merged_expanded.fasta","C7136661:0-107",1,10);
my $db = Bio::DB::Fasta->new($file);
my $seq = $db->seq($id, $start, $end);
print $seq,"\n";

Where the header of the sequence I'm trying to extract is: C7136661:0-107, as in the file:

>C7047455:0-100
TATAATGCGAATATCGACATTCATTTGAACTGTTAAATCGGTAACATAAGCAGCACACCTGGGCAGATAGTAAAGGCATATGATAATAAGCTGGGGGCTA

The code extracts the appropriate sequence when the header and $id in above is changed to

>test
TATAATGCGAATATCGACATTCATTTGAACTGTTAAATCGGTAACATAAGCAGCACACCTGGGCAGATAGTAAAGGCATATGATAATAAGCTGGGGGCTA

I'm thinking that BioPerl doesn't like the heading with the colon. Any way to fix this so I don't have to recode the FASTA files?

bioperl perl fasta • 3.9k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Have you tried escaping the : with a \ prefix? Or enclosing the id in single quotations instead? I'm not really sure that will make any difference, as I assume it is the BioPerl code that is failing, but it's worth a try ;)

ADD REPLY
0
Entering edit mode

Ignore this, just saw the SO post, which seems to have resolved your issue :)

ADD REPLY
2
Entering edit mode
12.0 years ago

You can use sed to change all the colons to underscores in the fasta. That might be the simplest way to fix the problem.

ADD COMMENT
0
Entering edit mode
12.0 years ago

Does it have to be Perl? This Python code would do the business ;)

# import BioPython SeqIO module
from Bio import SeqIO

(file, id, start, end) = ("secondround_merged_expanded.fasta", "C7136661:0-107", 1, 10) 

record_dict = SeqIO.index(file, "fasta")
print record_dict[id].seq[start:end]
ADD COMMENT

Login before adding your answer.

Traffic: 2098 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6