Hi, I am asking for suggestion for how to concatenate 24 stranded specific RNA seq libraries in linux ? I have tried few tricks on cat
but nothing made sense. So far the only command worked was ls *_1.fq.gz | sort | xargs cat > CG1-1_1.fq.gz
, HOWEVER, when I gunzip
the concatenated .fastq.gz , it showed
gzip: 80OF_01.fq.gz: invalid compressed data--crc error gzip: 80OF_01.fq.gz: invalid compressed data--length error
which suggested the concatenated .fastq.gz corrupted. Since it's stranded libraries, also it has to follow order to concatenate the fasq files
e.g.
cat control_401_01.fastq.gz control_402_01.fastq.gz control_403_01.fastq.gz > control_01.fastq
cat control_401_02.fastq.gz control_402_02.fastq.gz control_403_02.fastq.gz > control_02.fastq
..
..
some others like that.
Here are all the labels of libraries from one sample:
If you have any suggestions, pls let me know, thank you for your time!
why do you need to sort files before merging them? if sorting files is not needed, you can try this
cat *_1.fq.gz > CG1-1_1.fq.gz
ls
orfind
-ing is actually a good idea. There were posts (cannot find it now) that plaincat * (...)
can lead to unwanted behaviour appending the newly generated file to itself. Hence, it is better to first list the files and then use the| xargs cat
syntax.Ah, here is the thread I was referring to: merge large amount of fastq files into a single one
User needs to make sure that new file doesn't have same pattern used by
cat
. In this case it would becat *_1.fq.gz > CG1-1_1.fastq.gz
orCG1-1_R1.fq.gz
or some name which doesn't have_1.fq.gz
in output.