Question

Convert Fastq To Afg

0

Entering edit mode

13.0 years ago

Lee Katz ★ 3.2k

Hi all, is there a way to directly convert fastq to afg? Currently I have to convert fastq to fasta/qual, and then convert to afg using toAmos.

Here's a copy/paste of some of my code right now:

my ($fasta,$qual)=fastqToFasta($concatenatedReads,"$$settings{tempdir}/reads");
system("toAmos -s $fasta -q $qual -o $amosPrefix.afg")

conversion fasta fastq • 3.9k views

ADD COMMENT • link 13.0 years ago by Lee Katz ★ 3.2k

0

Entering edit mode

It might seem like this is a problem that doesn't quite need a fix, but each step is rather large because the reads file is so large. So, I appreciate anyone helping me save some computation time by skipping the middle man (ie fasta)

ADD REPLY • link 13.0 years ago by Lee Katz ★ 3.2k

score 1 · Answer 1 · 2011-11-10

Lee, this perl script 'fastq2afg.pl' should do what you want. You'll have to tweak it if your FASTQ files use a different offset. Let me know if it works, and I'll add in some command line options and release it on my website. Torst

#!/usr/bin/perl
use strict;
use warnings;

#    Copyright (C) 2011 Torsten Seemann <torsten@seemann.id.au>
#   
#       http://bioinformatics.net.au
#
#    This program is free software: you can redistribute it and/or modify
#    it under the terms of the GNU General Public License as published by
#    the Free Software Foundation, either version 3 of the License, or
#    (at your option) any later version.
#
#    This program is distributed in the hope that it will be useful,
#    but WITHOUT ANY WARRANTY; without even the implied warranty of
#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#    GNU General Public License for more details.
#
#    You should have received a copy of the GNU General Public License
#    along with this program.  If not, see <http://www.gnu.org/licenses/>.

my $FASTQ_OFFSET = 33;    # 33=sanger 64=illumina(<1.8)
my $AFG_OFFSET = 60;

my $iid=0;
while (my $eid = <ARGV>) {

  die "bad fastq ID line '$eid'" unless $eid =~ m/^\@/;
  $eid = substr $eid, 1;  # remove '@'

  my $seq = scalar(<ARGV>);
  chomp $seq;

  my $id2 = scalar(<ARGV>);
  die "bad fastq ID2 line '$id2" unless $id2 =~ m/^\+/;

  my $qlt = scalar(<ARGV>);
  chomp $qlt;

  $iid++;
  print "{RED\n";
  print "iid:$iid\n";
  print "eid:$eid";  # already has \n
  print "seq:\n";
  for (my $i=0; $i < length($seq); $i+=60) {
    print substr($seq, $i, 60), "\n";
  }
  print ".\n";
  print "qlt:\n";
  my @q = split m//, $qlt;
  @q = map { chr( ord($_)-$FASTQ_OFFSET + $AFG_OFFSET ) } @q;
  for (my $i=0; $i < @q; $i++) {
    print $q[$i];
    print "\n" if ($i+1) % 60 == 0;
  }
  print ".\n";
  print "}\n";
}

Ram · Answer 2 · 2011-11-11

I modified Torsten's script so that it is in a neat subroutine, and I modified it a little to make it faster (at the least, I think it could make it faster). It's not totally tested thoroughly as a subroutine yet though.

Thanks so much again Torsten.

# modified from Torsten Seemann <torsten@seemann.id.au>
# A: Convert Fastq To Afg
sub fastqToAfg{
  my($fastq,$outfile,$settings)=@_;
  $$settings{qOffset}||=33; # 33=sanger 64=illumina(<1.8)
  $$settings{aOffset}||=60;
  $|++;

  my $secondLine=`head -2 $fastq|tail -1`;
  chomp($secondLine);
  my $readLength=length($secondLine);

  open(FASTQ,$fastq) or die "Could not open $fastq because $!";
  open(AFG,">",$outfile) or die "Could not open $outfile for writing because $!";
  my $iid=0;
  while (my $eid = <FASTQ>) {

    die "bad fastq ID line '$eid'" unless $eid =~ m/^\@/;
    $eid = substr $eid, 1;  # remove '@'

    my $seq = scalar(<FASTQ>);
    chomp $seq;
    $seq=~s/(.{60})/\1\n/g; #newline every 60 bp

    my $id2 = scalar(<FASTQ>);
    die "bad fastq ID2 line '$id2" unless $id2 =~ m/^\+/;

    my $qlt = scalar(<FASTQ>);
    chomp $qlt;

    $iid++;
    print AFG "{RED\n";
    print AFG "iid:$iid\n";
    print AFG "eid:$eid";  # already has \n
    print AFG "seq:\n$seq\n";
    print AFG "qlt:\n";
    my @q = split m//, $qlt;
    @q = map { chr( ord($_)-$$settings{qOffset} + $$settings{aOffset} ) } @q;
    my $qltAfg=join("",@q);
    $qltAfg=~s/(.{60})/\1\n/g; #newline every 60 bp
    print AFG $qltAfg."\n";
    print AFG "}\n";
    print "." if($iid % 10000 == 0);
    #last if($iid>20);
  }
  print "\n";
  close FASTQ; close AFG;
  $|--;
  return $outfile;
}