Following alignment of 454 data, I want to convert an ACE file containing contig data to a FASTA file containing the 'aligned' consensus sequences. By aligned, I mean that each sequence within a contig has the same length. Characters (-) are added as needed on each side of a sequence in order to make it the same length as the consensus sequence from it's contig.
Ah now I get it: you want to convert the output of a sequence assembly (ACE assembly file format) into a FASTA file containing the consensus sequences, right?
I'm not sure if it works cause I'm without any ACE files. Of course, you can write your own parser as ACE files are very simple in structure, as one can see here. Check out Bio::Assembly methods. There a lot of ready-to-use utilities for size, quality, features, etc. I'll check for a biopython solution.
Some time ago I wrote a few tools to extract stuff from ACE files, including the contigs + quality, the assembly as Fasta with '-' for gaps (what you're asking for), and also the clusters (as list of input sequences, a la TGICL output).
Although elegant, the Bioperl/Biopython solutions are slow and tend to keep too many contig objects in memory. A simple ACE to Fasta perl extractor (assuming you want the contig sequences) would be this:
#!/usr/bin/perl
use strict;
use warnings;
# CO contig00001 67140 1618 1666 U
my $infile = $ARGV[0];
my $outfile = $ARGV[1];
open INPUT, $infile or die $!;
open OUTPUT, ">$outfile" or die $!;
my $waitForHeader = 1;
while (my $line = <INPUT>) {
if ($waitForHeader) {
if ($line=~"^CO") {
my @splitter = split (" ",$line);
print OUTPUT ">"."$splitter[1]\n";
$waitForHeader = 0;
}
else {
next;
}
}
else {
if ($line=~"^BQ") {
$waitForHeader=1;
}
else {
unless ($line eq "\n") {
$line=~s/\*/-/g;
print OUTPUT $line;
}
}
}
}
close INPUT;
close OUTPUT;
Can you rephrase your question? This does not really make sense to me.
Ah now I get it: you want to convert the output of a sequence assembly (ACE assembly file format) into a FASTA file containing the consensus sequences, right?
Right :) I rephrase to make a lesser mouthful!