Question

Concatenating 4 files into 1

0

Entering edit mode

22 months ago

Roland ▴ 20

Hi.

I'm trying to concatenate 4 files into one. This is how my raw data looks like:

> S9_L001_R1_001_1P.fq.gz  

> S9_L001_R1_001_1U.fq.gz

> S9_L001_R1_001_2P.fq.gz 

> S9_L001_R1_001_2U.fq.gz

> S10_L001_R1_001_1P.fq.gz  

> S10_L001_R1_001_1U.fq.gz

> S10_L001_R1_001_2P.fq.gz 

> S10_L001_R1_001_2U.fq.gz

I have twenty samples (S1-20) and all samples consist of four files (1P, 1U, 2P and 2U). The code I've come up with but that doesn't work looks like this:

for i in {1..20}; 
do for j in 1 2; 
do cat S${i}_L001_R1_001_${j}*.fq.gz >S${i}_concatenate.fq.gz; done; done

It only concatenates any 2 files from each sample.

Any suggestions? Thanks.

Concatenate • 1.5k views

ADD COMMENT • link updated 22 months ago by DavidStreid ▴ 90 • written 22 months ago by Roland ▴ 20

0

Entering edit mode

I hope there is a reason you are trying to cat these together. Based on the names it looks like these are properly paired and unpaired reads after trimming.

You code is ignoring the 1P, 1U, 2P and 2U in names. What order do you want to concatenate those pieces in?

ADD REPLY • link 22 months ago by GenoMax 147k

0

Entering edit mode

Since I'm not mapping the reads to a reference genome or building my own, I figured I might as well treat them as single end reads.

I don't think it matters what order I map them in, but I guess 1P-1U-2P-2U

ADD REPLY • link 22 months ago by Roland ▴ 20

score 1 · Answer 1 · 2023-01-23

1

Entering edit mode

22 months ago

DavidStreid ▴ 90

Change the > to >> in the inner loop

> Writes a new file, overwriting anything already there
>> Also creates a new file, but will append to the existing file if present

Your code only writes the two S${i}_L001_R1_001_2*.fq.gz files for any given i because it is overwriting the output of the S${i}_L001_R1_001_1*.fq.gz files in the second pass through the inner loop

for i in {1..20}; do 
  for j in 1 2; do
    # ONLY CHANGE: ">" => ">>"
    cat S${i}_L001_R1_001_${j}*.fq.gz >> S${i}_concatenate.fq.gz;
  done
done

ADD COMMENT • link 22 months ago by DavidStreid ▴ 90

1

Entering edit mode

Thank you so much! This worked.

ADD REPLY • link 22 months ago by Roland ▴ 20

0

Entering edit mode

Good luck, np!

ADD REPLY • link 22 months ago by DavidStreid ▴ 90

score 0 · Answer 2 · 2023-01-23

0

Entering edit mode

22 months ago

Mensur Dlakic ★ 28k

I am all for writing code to support tedious tasks, and I hope you get your answer. That said, it seems easier to type cat and paste 10 names, and do so twice, than to wait for responses here.

From what I can tell, the only thing that needs changing is * to ?

for i in {1..20};
do for j in 1 2;
do cat S${i}_L001_R1_001_${j}?.fq.gz > S${i}_concatenate.fq.gz; done; done

When in doubt, I suggest you put an echo command in front of your actual command. It will print everything on screen without executing it, so it may be easier to troubleshoot what is wrong.

for i in {1..20};
do for j in 1 2;
do echo "cat S${i}_L001_R1_001_${j}?.fq.gz > S${i}_concatenate.fq.gz" ; done; done

ADD COMMENT • link 22 months ago by Mensur Dlakic ★ 28k

0

Entering edit mode

Maybe this will do the trick:

for i in {1..20};
do cat S${i}_L001_R1_001_??.fq.gz > S${i}_concatenate.fq.gz; done

ADD REPLY • link 22 months ago by Mensur Dlakic ★ 28k

0

Entering edit mode

~~The ? vs. ?? do?~~

Ah just tried it, the ? is very helpful as a wildcard - thank you

ADD REPLY • link 22 months ago by DavidStreid ▴ 90

0

Entering edit mode

Thank you for your help. I'm currently working with some "test" samples in preparation for my real data which consists of well over 200 samples, so that's why I'd like to have it automated!

ADD REPLY • link 22 months ago by Roland ▴ 20