Entering edit mode
6.5 years ago
suny.bio
•
0
I am creating a single .fastq.gz file from many .fastq.gz files with the following command
zcat 15_S15*.fastq.gz | gzip -c > combined_file.fastq.gz
- I like to keep my original fastq.gz files and create a combined fastq.gz file, that's why using gzip -c
Now, I want to do it with gnu parallel command.
Anyone help me
In addition to the answer of ATpoint you could have a look at pigz for parallel compression.
cat 15_S15*.fastq.gz | pigz -p 4 > combined_file.fastq.gz
works beautifully. Thanks a lot WouterDeCoster
This makes no sense. You're compressing already-compressed files, which is adding to your runtime. All you have to do is write the output of
cat
to a file as ATpoint shows; you don't need to recompress it.http://mattmahoney.net/dc/dce.html#Section_11
I don't think you can write to a single file handle from multiple independent processes (you could do a small test to convince yourself). Parallel does not make sense in this case.
zcat 15_S15*.fastq.gz | parallel --pipe --block 2M > output.fastq.gz
orzcat 15_S15*.fastq.gz | parallel --pipe --N140000 > output.fastq.gz
Memory(M) and number(N) can be configured.
Do you know first hand if this will produce sane results? See my comment above.
Negating the efficiency of the prorams (cat and parallel) for this issue, output from cat and parallel are as below:
input:
output from cat and zcat and parallel (gzip is used to gzip the resultant fastq):
md5sums would be different as parallel output is not sequential.
Multiqc results from fastqc on both the files:
I'm getting error
@OP: Try
Results on example files:
This works like a charm.
Thanks a lot cpad0112
But there is no need to decompress and compress again, the answer of ATpoint is what you need, not this. Okay it works but it can't be efficient.
@OP: Output would be in .fastq format. Not in gzipped format. I overlooked that part. You need to add zipping command for gz.