How to append two fastq files ?
1
1
Entering edit mode
5.0 years ago
debitboro ▴ 270

Dear Biostars,

How can I append sequences of two fastq files ?

Suppose we have two fastq files:

 **file1.fastq**

@HEADER
CTCAGNTTGG
+
AAAAA#EEEE
@HEADER
GTGAGTTTAG
+
AA<AA#EE<E

**file2.fastq**

@HEADER
CTTTA
+
#EEEA
@HEADER
GTGAG
+
A#E<E

**result.fastq = append file2.fastq to file1.fastq**

@HEADER
CTTTACTCAGNTTGG
+
#EEEAAAAAA#EEEE
@HEADER
GTGAGGTGAGTTTAG
+
A#E<EAA<AA#EE<E

How can I do that ?

fastq RNA-Seq append merge • 3.2k views
ADD COMMENT
1
Entering edit mode

Why do you need to do this?

To do:

cat file1.fastq file2.fastq > file.fastq
ADD REPLY
0
Entering edit mode

Thank you for your answer, but that is not what I'm looking for. I think the question is clearly formulated: I need to prefix the reads of the first file with the reads of the second file.

ADD REPLY
0
Entering edit mode

Ah, I see, sorry.

But then, my question is even more pertinent: why do you need to do this?

For a quick and dirty concatenation:

paste -d "" file1.fastq file2.fastq > file.fastq
ADD REPLY
0
Entering edit mode

I want to concatenate only the read sequences and quality sequences not all the components of a record. I need to do that to adapt my file as an input for in-house script for UMI deduplication.

ADD REPLY
0
Entering edit mode

Then you should script yourself a solution with awk, perl or python, it shouldn't be too difficult.

ADD REPLY
2
Entering edit mode
5.0 years ago
debitboro ▴ 270

I got it, simply:

paste -d '\n' file2.fastq file1.fastq | sed -n 'p;n;n;N;s/\n//p' > result.fastq

Thank you h.mon for you help

ADD COMMENT
0
Entering edit mode

Thanks so much, this was very helpful. I have a very similar goal, with one difference: I'd like to print the header of file 1 (in your example) rather than that of file 2. I've been trying to work out what sed command allows me to do that, but haven't managed yet. Would you have any idea?

So for clarification, this is what I would like:

**file1.fastq**

@HEADER1a
CTCAGNTTGG
+
AAAAA#EEEE
@HEADER1b
GTGAGTTTAG
+
AA<AA#EE<E

**file2.fastq**

@HEADER2a
CTTTA
+
#EEEA
@HEADER2b
GTGAG
+
A#E<E

**result.fastq = append file2.fastq to file1.fastq**

@HEADER1a
CTTTACTCAGNTTGG
+
#EEEAAAAAA#EEEE
@HEADER1b
GTGAGGTGAGTTTAG
+
A#E<EAA<AA#EE<E
ADD REPLY
0
Entering edit mode

paste -d '\n' file2.fastq file1.fastq | sed -n 'n;p;n;N;s/\n//p' > result.fastq

ADD REPLY

Login before adding your answer.

Traffic: 2113 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6