Convert Fastq To Afg
2
0
Entering edit mode
13.1 years ago
Lee Katz ★ 3.2k

Hi all, is there a way to directly convert fastq to afg? Currently I have to convert fastq to fasta/qual, and then convert to afg using toAmos.

Here's a copy/paste of some of my code right now:

my ($fasta,$qual)=fastqToFasta($concatenatedReads,"$$settings{tempdir}/reads");
system("toAmos -s $fasta -q $qual -o $amosPrefix.afg")
conversion fasta fastq • 3.9k views
ADD COMMENT
0
Entering edit mode

It might seem like this is a problem that doesn't quite need a fix, but each step is rather large because the reads file is so large. So, I appreciate anyone helping me save some computation time by skipping the middle man (ie fasta)

ADD REPLY
1
Entering edit mode
13.1 years ago
Torst ▴ 980

Lee, this perl script 'fastq2afg.pl' should do what you want. You'll have to tweak it if your FASTQ files use a different offset. Let me know if it works, and I'll add in some command line options and release it on my website. Torst

#!/usr/bin/perl
use strict;
use warnings;

#    Copyright (C) 2011 Torsten Seemann <torsten@seemann.id.au>
#   
#       http://bioinformatics.net.au
#
#    This program is free software: you can redistribute it and/or modify
#    it under the terms of the GNU General Public License as published by
#    the Free Software Foundation, either version 3 of the License, or
#    (at your option) any later version.
#
#    This program is distributed in the hope that it will be useful,
#    but WITHOUT ANY WARRANTY; without even the implied warranty of
#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#    GNU General Public License for more details.
#
#    You should have received a copy of the GNU General Public License
#    along with this program.  If not, see <http://www.gnu.org/licenses/>.

my $FASTQ_OFFSET = 33;    # 33=sanger 64=illumina(<1.8)
my $AFG_OFFSET = 60;

my $iid=0;
while (my $eid = <ARGV>) {

  die "bad fastq ID line '$eid'" unless $eid =~ m/^\@/;
  $eid = substr $eid, 1;  # remove '@'

  my $seq = scalar(<ARGV>);
  chomp $seq;

  my $id2 = scalar(<ARGV>);
  die "bad fastq ID2 line '$id2" unless $id2 =~ m/^\+/;

  my $qlt = scalar(<ARGV>);
  chomp $qlt;

  $iid++;
  print "{RED\n";
  print "iid:$iid\n";
  print "eid:$eid";  # already has \n
  print "seq:\n";
  for (my $i=0; $i < length($seq); $i+=60) {
    print substr($seq, $i, 60), "\n";
  }
  print ".\n";
  print "qlt:\n";
  my @q = split m//, $qlt;
  @q = map { chr( ord($_)-$FASTQ_OFFSET + $AFG_OFFSET ) } @q;
  for (my $i=0; $i < @q; $i++) {
    print $q[$i];
    print "\n" if ($i+1) % 60 == 0;
  }
  print ".\n";
  print "}\n";
}
ADD COMMENT
0
Entering edit mode

Thank you so much! I modified it as a subroutine and will include it in a formatted answer below.

ADD REPLY
0
Entering edit mode
13.1 years ago
Lee Katz ★ 3.2k

I modified Torsten's script so that it is in a neat subroutine, and I modified it a little to make it faster (at the least, I think it could make it faster). It's not totally tested thoroughly as a subroutine yet though.

Thanks so much again Torsten.

# modified from Torsten Seemann <torsten@seemann.id.au>
# A: Convert Fastq To Afg
sub fastqToAfg{
  my($fastq,$outfile,$settings)=@_;
  $$settings{qOffset}||=33; # 33=sanger 64=illumina(<1.8)
  $$settings{aOffset}||=60;
  $|++;

  my $secondLine=`head -2 $fastq|tail -1`;
  chomp($secondLine);
  my $readLength=length($secondLine);

  open(FASTQ,$fastq) or die "Could not open $fastq because $!";
  open(AFG,">",$outfile) or die "Could not open $outfile for writing because $!";
  my $iid=0;
  while (my $eid = <FASTQ>) {

    die "bad fastq ID line '$eid'" unless $eid =~ m/^\@/;
    $eid = substr $eid, 1;  # remove '@'

    my $seq = scalar(<FASTQ>);
    chomp $seq;
    $seq=~s/(.{60})/\1\n/g; #newline every 60 bp

    my $id2 = scalar(<FASTQ>);
    die "bad fastq ID2 line '$id2" unless $id2 =~ m/^\+/;

    my $qlt = scalar(<FASTQ>);
    chomp $qlt;

    $iid++;
    print AFG "{RED\n";
    print AFG "iid:$iid\n";
    print AFG "eid:$eid";  # already has \n
    print AFG "seq:\n$seq\n";
    print AFG "qlt:\n";
    my @q = split m//, $qlt;
    @q = map { chr( ord($_)-$$settings{qOffset} + $$settings{aOffset} ) } @q;
    my $qltAfg=join("",@q);
    $qltAfg=~s/(.{60})/\1\n/g; #newline every 60 bp
    print AFG $qltAfg."\n";
    print AFG "}\n";
    print "." if($iid % 10000 == 0);
    #last if($iid>20);
  }
  print "\n";
  close FASTQ; close AFG;
  $|--;
  return $outfile;
}
ADD COMMENT
1
Entering edit mode

I was going to use the s/(.{60})/ trick but didn't in the end. I reckon the substr might be faster, but ultimately doesn't matter much. I note your $readLength variable is never used? If it is, make sure it is in the INNER loop, as reads can be variable length. Also be aware that my/our script does not handle FASTQ where the seq/qual is spread over multiple lines! Also $|++ will SLOW it down as it turns off output buffering. And the map/split qlt changer could probably be done efficiently in situ with a pack() or tr// command.

ADD REPLY
0
Entering edit mode

I think I was in the middle of testing the subroutine and left in some artifacts. Your points are absolutely right on target, and I will fix my subroutine on my computer. I would love to see that tr// command but that's not something I trust myself doing (I wish I were as good as you at all this!)

ADD REPLY

Login before adding your answer.

Traffic: 1400 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6