Entering edit mode
9.0 years ago
PAn
▴
20
I need to write a perl script to read gzipped fastq files from a text file list of their paths and then concatenate them together and output to a new gzipped file. (I need to do this in perl as it will be implemented in a pipeline) I am not sure how to accomplish the zcat and concatenation part, as the file sizes would be in Gbs, I need to take care of the storage and run time as well.
So far I can think of it as -
use strict;
use warnings;
use IO::Compress::Gzip qw(gzip $GzipError);
#-------check the input file specified-------------#
$num_args = $#ARGV + 1;
if ($num_args != 1) {
print "\nUsage: name.pl Filelist.txt \n";
exit;
$file_list = $ARGV[0];
#-------------Read the file into arrray-------------#
my @fastqc_files; #Array that contains gzipped files
use File::Slurp;
my @fastqc_files = $file_list;
#-------use the zcat over the array contents
my $outputfile = "combined.txt"
open(my $combined_file, '>', $outputfile) or die "Could not open file '$outputfile' $!";
for my $fastqc_file (@fastqc_files) {
open(IN, sprintf("zcat %s |", $fastqc_file))
or die("Can't open pipe from command 'zcat $fastqc_file' : $!\n");
while (<IN>) {
while ( my $line = IN ) {
print $outputfile $line ;
}
}
close(IN);
my $Final_combied_zip = new IO::Compress::Gzip($combined_file);
or die "gzip failed: $GzipError\n";
Somehow I am not able to get it to run. Can anyone share if there is simpler/ correct method to accomplish this? Thanks!
using zcat and compressing is useless: A: How To Merge Two Fastq.Gz Files?
What would be better way to combine gzip files then? I need to basically stitch them together, not just combine gzip files into one big gzip file (and I need to take the GBs size of file into account too)
The point is that "stitching them together" just means concatenating them. There is no difference. You can do this in one line (without perl) as Pierre's comment suggests.
Thanks Pierre and Sean, I understand its better to run it as one line command rather than perl but I really need to run it in perl, as I need to implement it in a pipeline which has other components, config files and XML caller etc. I will give it another shot, else will tell the collaborators to settle with one liner ( I prefer it as well)!
You can run a shell command from perl. A little googling will tell you how.
Thanks Sean, yes I got it running by simply using system
zcat
command in script. Thanks!Glad it worked out for you. Remember to "remove" your output file before entering the loop so that if the script has failed, you don't simply append to the "bad" file.
Oh yes, thats right. Thanks for pointing it out. I have another question - can I use ARGV for input-file instead of specifying it in script. I tried modifying the script to
instead of specifying the path for input file
but it shows error, can you please point out, sorry it must be very trivial.
Thanks!
You'll definitely need to do a little reading on arguments in perl. For example:
http://alvinalexander.com/perl/perl-command-line-arguments-read-args
Yes working on the perl basics. Thanks, it works now!