How To Remove The Identifier And Quality From A Fastq File
4
1
Entering edit mode
12.0 years ago

I have a fastq file and need to remove the identifier, quality, and white space. So far my script looks something like this:

#!/usr/bin/perl -w

use strict;

my $inputFileName = shift;
my $outputFileName = shift;

open INPUTFILE, "<$inputFileName" or die "poop";
open OUTPUT, ">$outputFileName" or die "poop";

my @bases = ('A', 'G', 'T', 'C');
my $line;

while ($line = <INPUTFILE>) {
  chomp $line;
  if ($line =~ /^\s*$/)
  elsif ($line =~ /^\s*@/)
  elsif ($line =~ /^+/)   
  else {print OUTPUT $line, "\n";
}
}

However I keep getting an empty output file. I'm very new to perl, so be gently.

Thanks!!

fastq perl • 4.6k views
ADD COMMENT
0
Entering edit mode

Your problem is the fastq format doesn't contains spaces, to get only the sequence, it's better to count lines like the solutions proposed below. Also you have no instructions after your if, elsif, you should use next.

ADD REPLY
1
Entering edit mode
12.0 years ago
Irsan ★ 7.8k
#!/usr/bin/env perl
use strict;
use warnings;

my $lines = 4;
my $delimiter = "\t";
my $input = shift @ARGV;

open(INPUT,"$input");
while (<INPUT>) {
    if($. % $lines == 2){ # the % character is the modulo operator Pierre was talking about
        print
    }
}
close(INPUT);

Put that in a file called extract_reads_from_fastq.pl and do the trick with:

perl extract_reads_from_fastq.pl input.fastq
ADD COMMENT
0
Entering edit mode

And boom goes the dynamite!! Thanks this worked perfectly!

ADD REPLY
4
Entering edit mode
12.0 years ago

if your fastq file uses a standard layout (4 lines per record ) you could just count the lines and keep those having the correct modulo.

Or, using awk:

awk '(NR%4==2)'  < file.fastq
ADD COMMENT
1
Entering edit mode
12.0 years ago

You could also do something like this:

#!/usr/bin/perl
use strict;
use warnings;

my $file = shift;
open my $F, $file;
LINE: while ($_=<$F>) {
    my @line = split /\t/;
    chomp @line;
    next if /^@/; # gets rid of line 1 of fastq
    next if /^\+/; # gets rid of line 3
    next if /^!/; # gets rid of line 4 if it begins with a ! -- check your files format

    my $printme = 0;
    ++$printme;

     print join(qq/\t/, @line) if $printme;
     }
print STDERR "Done.\n";

Of course there's probably a nice one-liner too for this...

ADD COMMENT
0
Entering edit mode

I used to eval fastq files with those reg-ex, but Illumina 1.8+ (Phred+33) brokes that.

ADD REPLY
1
Entering edit mode
12.0 years ago
JC 13k

Perl one liner:

perl -ne 'print if (++$n % 4 == 2)' < file.fq > output
ADD COMMENT

Login before adding your answer.

Traffic: 1686 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6