It is a totally justified question, though it's an alignment process what is required not only a conversion, there are several possible pipelines. Also, knowing what data you are having would help a lot.
You need data in fasta or fastq format and your reference genome in fasta format.
If your data is in .sff
(Standard Flowspace Format) you have to convert to fasta format using the sffinfo program coming with the 454 software.
I have a rather old version of the GS FLX manual and there sffinfo didn't write a fastq file, but both a fasta file and a quality file. Another option is sff_extract, but that doesn't give fastq either.
The data can be combined into a fastq file using a simple perl script (I can post one if required), or discard the qualities and align the fasta file only.
Then align your 454 reads against the reference sequence/genome using an alignment software that can output SAM format and works with "medium length" reads. One tool that directly aligns fasta and gives SAM is lastz, you have to play with the switches though.
Simple as that ;)
Edit, here is a simple perl script that makes a fastq file out of fasta file and a qualiti file. It's not much tested and if the headers and data in fasta and qual file are not exactly matching, it fails miserably.
#!/usr/bin/env perl
use strict;
use warnings;
die ("Usage: fasta2fastq <fasta.file> <qual.file>") unless (scalar @ARGV) == 2;
open FASTA, $ARGV[0] or die "cannot open fasta: $!\n";
open QUAL, $ARGV[1] or die "cannot open qual: $!\n";
my $offset = 33; # I think this was 33 for sanger FASTQ, change this if required!
my $count = 0;
local($/) = "\n>"; # split the input fasta file by FASTA records
# this is some splitting of the fasta by line
while (my $fastarec = <FASTA>) {
chomp $fastarec;
my ($fid, @seq) = split "\n", $fastarec;
my $seq = join "", @seq; $seq =~ s/\s//g;
my $qualrec = <QUAL>;
chomp $qualrec;
my ($qid, @qual) = split "\n", $qualrec;
@qual = split /\s+/, (join( " ", @qual));
# convert score to character code:
my @qual2 = map {chr($_+$offset)} @qual;
my $quals = join "", @qual2;
die "missmatch of fasta and qual: '$fid' ne '$qid'" if $fid ne $qid;
$fid =~ s/^\>//;
print STDOUT (join( "\n", "@".$fid, $seq, "+$fid", $quals), "\n");
$count++;
}
close (FASTA);
close (QUAL);
print STDERR "wrote $count entries\n";
This may only partially help, but SSAHA2 reportedly outputs SAM format.
A similar question has also been previously posted on BioStar.
I had success with glu genetics, but you might need to fight the installer as noted on the question I asked and answered.
try mosaik aligner. It is well designed for working with 454 data and it supports SAM format (you need use mosaiktext to transfer mosaikalign.dat to sam format although).
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hi litali, It is rather unclear what you mean by '454 output', mostly since you want to put it in an alignment format. Are you referring to the .sff file that comes out of the Roche sequencer? Or maybe to the sequences once they are assembled, possibly in .ace format? This should help us help you. Cheers.
added the script