Hi,
I want to merge multiple .fastq.gz files (forward/Reverse), and using following command:
zcat dir1/ETH002281_ACAGTG_L00*_R1_00*.fastq.gz dir2/ETH002281_ACAGTG_L00*_R1_00*.fastq.gz dir3/ETH002281_ACAGTG_L003_R1_001.fastq.gz | gzip > dir4/ETH002281_ACAGTG_Lall_R1.gz
Although it run fine but it takes huge time as I am able to run it on single node, I want to run it on multiples nodes as I have access of 15 nodes with 8 cores each. It would be great if I get idea how to merge multiples fastq.gz files using various computational nodes in order to finish the job earliest using maximum computational power of nodes. Thanks
Thank you for response. I'll try pigz while using zcat | gzip. True, cat command is much faster(approx 40X) in comparison of gcat|gzip but i want to avoid it just as it doesn't compress merged files, expecting size differences in GBs of final merged files.
You can concatenate gzipped files and the result is still a valid compressed gzipped file; I don't really see any reason to avoid that. The difference in compression would be negligible compared to recompressing it unless you have millions of tiny files.
I agree that
cat
-ing gzip files is the best solution here. However, I vaguely remember that, strictly speaking, a gzip file produced by concatenating individual gzips is not "valid" since the footer of the concatenated files does not represent the whole file but only the last gzip file concatenated.