Batch concatenate fastq files in series?
1
0
Entering edit mode
2.3 years ago

I have a list of 80 fastq files. These are 40 samples with 2 technical replicates each that I'd like to concatenate.

Rather than repeating for e.g:

cat sample1n1.fastq sample1n2.fastq > sample1_cat.fastq

cat sample2n1.fastq sample2n2.fastq > sample2_cat.fastq

etc...

Is there a command that automates this?

Thanks

concatenate fastq batch • 1.5k views
ADD COMMENT
0
Entering edit mode
 cat ample1n*.fastq > sample1_cat.fastq

For gzip files:

 zcat ample1n*.fastq.gz > sample1_cat.fastq.gz
ADD REPLY
0
Entering edit mode

@shenwei356 this is not right. You appear to be missing a s at beginning of the command. This may also cause a problem since the cat file may also get into this wild card.

ADD REPLY
1
Entering edit mode

You're right. To avoid re-reading the existed output file, one can set the output to a different directory (not the current path), or use a different file extension like .fq.

Or filter out the out file from the list (seems too verbose).

 echo -n >  sample1_cat.fastq
 ls sample1n*.fastq  | grep -v  sample1_cat.fastq | while read f; do cat $f >>  sample1_cat.fastq; done
ADD REPLY
4
Entering edit mode
2.3 years ago

GNU parallel solution since it's convenient.

parallel -kj 1 --link --dry-run cat {1} {2} '>' {=1 s/n[12]\.fastq$// =}_cat.fastq ::: *n1.fastq ::: *n2.fastq

Remove --dry-run if the commands look good.

ADD COMMENT
1
Entering edit mode

Would you mind expounding a bit on what this part of the code is doing?

{=1 s/n[12]\.fastq$// =}

Seems like this could be very useful if I understood it a bit better.

ADD REPLY
2
Entering edit mode

GNU parallel has a few ways to replace or remove parts of strings. For example, {.} removes extensions, {/} removes paths, and {/.} removes both the path and extension. If you want more control for string replacement you can pass perl string replacement (which is similar to sed) via {= s/regex/replacement/ =}. In this case the regex n[12]\.fastq$ is capturing (for example) n1.fastq from sample1n1.fastq and replacing it with nothing. Note that the perl replacement starts with {=1 in the actual code because I am doing a replacement for the n1 file in each pair to come up with the final name.

See the documentation for more information.

ADD REPLY
1
Entering edit mode

Thanks, I appreciate it!

ADD REPLY

Login before adding your answer.

Traffic: 1994 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6