concatenate using cat
2
0
Entering edit mode
8.6 years ago
snp87 ▴ 80

Hello! I have just started working with some RNAseq data that was generated using Nextseq. Each sample generated 8 files (1_L001*_R1*_001.fastq.gz, L001_R2, L002_R1, L002_R2, L003_R1, L003_R2, L004_R1 and L004_R2). I am trying to use cat to concatenate the files but it keeps saying command not found. Can someone assist with how this could be done?

Thanks so much!

RNA-Seq sequencing • 3.3k views
ADD COMMENT
1
Entering edit mode

You should be concatenating R1 and R2 files separately (and in the same order) to avoid issues with mis-ordered pairs. Processing the pairs in existing pieces can allow you to do things in parallel. Data can then be merged at the BAM level. Something to consider.

ADD REPLY
0
Entering edit mode

can you show us the command you're using?

ADD REPLY
0
Entering edit mode
cat 1_L001_R1_001.fastq.gz1_L002_R1_001.fastq.gz1_L003_R1_001.fastq.gz1_S3_L004_R1_001.fastq.gz > 1_R1_001.fastq.gz
ADD REPLY
0
Entering edit mode

Can you also post the error?

ADD REPLY
0
Entering edit mode

show me the output of

which cat

and

echo "A" | cat

ADD REPLY
0
Entering edit mode

Thank you to everyone for the quick replies. It says command not found.

Pierre, the outputs are as follows: /bin/cat and A

ADD REPLY
1
Entering edit mode

Based on this output you should not get a command not found error.

ADD REPLY
1
Entering edit mode

If the command you posted above is correct (after I formatted it) the problem may be that you don't have spaces between the file names. So try this

cat 1_L001_R1_001.fastq.gz 1_L002_R1_001.fastq.gz 1_L003_R1_001.fastq.gz 1_S3_L004_R1_001.fastq.gz > 1_R1_001.fastq.gz
ADD REPLY
0
Entering edit mode

Sorry, I thought there wasn't supposed to be spaces. Thanks so much!

ADD REPLY
0
Entering edit mode

Some applications (e.g. HISAT2, TopHat etc) expect filenames for replicates (R1/R2) to be separated by commas but for a system command like cat you need to separate the input files with a space to signify that they are separate files being joined together (>) to create a new file.

ADD REPLY
0
Entering edit mode
8.6 years ago
ivivek_ngs ★ 5.2k
cat input_dir/*R1*fastq.gz > path_to_output_dir/combined_R1.fastq.gz

cat input_dir/*R2*fastq.gz > path_to_output_dir/combined_R2.fastq.gz

However you do not need to do that if you want to align or run quantification of transcripts. Most tools can accept the chunk files. Even for QC it should be fine. The aligned file can be created on the fly with all the chunked fastq.gz files.

edit: Yes genomax is correct since they are paired end , so read-mates should be concatenates separately according to their pairs.

ADD COMMENT
2
Entering edit mode

don't use zcat, just cat. zcat would uncompress the fastq.gz files.

ADD REPLY
0
Entering edit mode

ah yes true, it was in a hurry, I edited , but I am curious to know what the OP wants to do by creating one file, memory efficient way is what should be the approach.

ADD REPLY
0
Entering edit mode

I thought it might be better to concatenate first before aligning. Would you recommend to not do this?

ADD REPLY
1
Entering edit mode

If you don't want to deal with separate files/processes then sure. Either way would be fine. If you are going to trim data make sure you use a paired-end aware trimming program and trim the files in pairs (R1/R2).

ADD REPLY
0
Entering edit mode

No it is not required. Even for trimming you can pass the chunks and then process them to the aligners. Just give a proper pattern for your input parsing for the programs to recognize your R1 and R2 chunks separately for operations.

ADD REPLY
0
Entering edit mode

Thanks so much for the suggestions.

ADD REPLY
0
Entering edit mode
8.6 years ago
chen ★ 2.5k

You cannot cat two gzipped files, because it will break the gzip format.

gunzip them first and cat the unzipped files

ADD COMMENT
5
Entering edit mode

This is incorrect. Concatenated gzipped files are, in fact, valid. There are a few specific programs which fail on concatenated gzipped files, but that is due to noncompliant gzip implementation, as far as I understand it. Mainstream gzip implementations handle it just fine.

But don't take my word for it - try it with gzip, and become a true believer!

ADD REPLY
1
Entering edit mode

You are correct, I just did a try to cat gz files, and it did work.

Thanks for your correction, man!

ADD REPLY

Login before adding your answer.

Traffic: 3619 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6