concatenate 24 stranded specific RNA seq fastq libraries in linux ?
1
0
Entering edit mode
2.9 years ago
slin023 • 0

Hi, I am asking for suggestion for how to concatenate 24 stranded specific RNA seq libraries in linux ? I have tried few tricks on cat but nothing made sense. So far the only command worked was ls *_1.fq.gz | sort | xargs cat > CG1-1_1.fq.gz, HOWEVER, when I gunzip the concatenated .fastq.gz , it showed

gzip: 80OF_01.fq.gz: invalid compressed data--crc error gzip: 80OF_01.fq.gz: invalid compressed data--length error

which suggested the concatenated .fastq.gz corrupted. Since it's stranded libraries, also it has to follow order to concatenate the fasq files

e.g.

cat control_401_01.fastq.gz control_402_01.fastq.gz control_403_01.fastq.gz > control_01.fastq
cat control_401_02.fastq.gz control_402_02.fastq.gz control_403_02.fastq.gz > control_02.fastq 
..
..

some others like that.

Here are all the labels of libraries from one sample:

enter image description here

If you have any suggestions, pls let me know, thank you for your time!

RNA-seq • 2.7k views
ADD COMMENT
0
Entering edit mode

why do you need to sort files before merging them? if sorting files is not needed, you can try this cat *_1.fq.gz > CG1-1_1.fq.gz

ADD REPLY
0
Entering edit mode

ls or find-ing is actually a good idea. There were posts (cannot find it now) that plain cat * (...) can lead to unwanted behaviour appending the newly generated file to itself. Hence, it is better to first list the files and then use the | xargs cat syntax.

ADD REPLY
0
Entering edit mode

Ah, here is the thread I was referring to: merge large amount of fastq files into a single one

ADD REPLY
0
Entering edit mode

User needs to make sure that new file doesn't have same pattern used by cat. In this case it would be cat *_1.fq.gz > CG1-1_1.fastq.gz or CG1-1_R1.fq.gz or some name which doesn't have _1.fq.gz in output.

ADD REPLY
1
Entering edit mode
2.9 years ago

You can try:
cat $(ls *_1.fq.gz | sort) > control_01.fq.gz

ADD COMMENT
0
Entering edit mode

I tried your command; unfortunately, when I gunzip the .fq.gz, it still shows gzip: 80OF_01.fq.gz: invalid compressed data--crc error gzip: 80OF_01.fq.gz: invalid compressed data--length error

ADD REPLY
0
Entering edit mode

can you try gunzip and re gzip file 80OF_01.fq.gz only and then cat all files? Take a back up of your file before you do this.

ADD REPLY
0
Entering edit mode

so I did some test. I can gunzip .fastq.gz from another sample, but not "80OF" sample, and I can also map the non-corrupted .fastq files using STAR ; apparently at least one of them gets corrupted in "80OF" sample . FastQC report should reveals which one, correct? or any command could show which one is it?

ADD REPLY

Login before adding your answer.

Traffic: 2736 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6