Question

How to append two fastq files ?

1

Entering edit mode

5.1 years ago

debitboro ▴ 270

Dear Biostars,

How can I append sequences of two fastq files ?

Suppose we have two fastq files:

 **file1.fastq**

@HEADER
CTCAGNTTGG
+
AAAAA#EEEE
@HEADER
GTGAGTTTAG
+
AA<AA#EE<E

**file2.fastq**

@HEADER
CTTTA
+
#EEEA
@HEADER
GTGAG
+
A#E<E

**result.fastq = append file2.fastq to file1.fastq**

@HEADER
CTTTACTCAGNTTGG
+
#EEEAAAAAA#EEEE
@HEADER
GTGAGGTGAGTTTAG
+
A#E<EAA<AA#EE<E

How can I do that ?

fastq RNA-Seq append merge • 3.2k views

ADD COMMENT • link updated 2.3 years ago by adi.rotem ▴ 20 • written 5.1 years ago by debitboro ▴ 270

1

Entering edit mode

Why do you need to do this?

To do:

cat file1.fastq file2.fastq > file.fastq

ADD REPLY • link 5.1 years ago by h.mon 35k

0

Entering edit mode

Thank you for your answer, but that is not what I'm looking for. I think the question is clearly formulated: I need to prefix the reads of the first file with the reads of the second file.

ADD REPLY • link 5.1 years ago by debitboro ▴ 270

0

Entering edit mode

Ah, I see, sorry.

But then, my question is even more pertinent: why do you need to do this?

For a quick and dirty concatenation:

paste -d "" file1.fastq file2.fastq > file.fastq

ADD REPLY • link 5.1 years ago by h.mon 35k

0

Entering edit mode

I want to concatenate only the read sequences and quality sequences not all the components of a record. I need to do that to adapt my file as an input for in-house script for UMI deduplication.

ADD REPLY • link 5.1 years ago by debitboro ▴ 270

0

Entering edit mode

Then you should script yourself a solution with awk, perl or python, it shouldn't be too difficult.

ADD REPLY • link 5.1 years ago by h.mon 35k

score 2 · Accepted Answer · 2019-11-20

2

Entering edit mode

5.1 years ago

debitboro ▴ 270

I got it, simply:

paste -d '\n' file2.fastq file1.fastq | sed -n 'p;n;n;N;s/\n//p' > result.fastq

Thank you h.mon for you help

ADD COMMENT • link 5.1 years ago by debitboro ▴ 270

0

Entering edit mode

Thanks so much, this was very helpful. I have a very similar goal, with one difference: I'd like to print the header of file 1 (in your example) rather than that of file 2. I've been trying to work out what sed command allows me to do that, but haven't managed yet. Would you have any idea?

So for clarification, this is what I would like:

**file1.fastq**

@HEADER1a
CTCAGNTTGG
+
AAAAA#EEEE
@HEADER1b
GTGAGTTTAG
+
AA<AA#EE<E

**file2.fastq**

@HEADER2a
CTTTA
+
#EEEA
@HEADER2b
GTGAG
+
A#E<E

**result.fastq = append file2.fastq to file1.fastq**

@HEADER1a
CTTTACTCAGNTTGG
+
#EEEAAAAAA#EEEE
@HEADER1b
GTGAGGTGAGTTTAG
+
A#E<EAA<AA#EE<E