Entering edit mode
9.0 years ago
nalandaatmi
▴
110
Hi All,
Recently I concatenated two fastq files from one library belonging to the same sample but loaded in different lanes, using the following command.
cat Sample01_L001_R1.all.fastq.gz Sample01_L006_R1.all.fastq.gz > Sample01_L001-6_R1.all.fastq.gz
Sample fastq file content:
@HISEQ:137:C8W59ACXX:1:1101:1183:2157 1:N:0:TATGGC
GTATCATTAAAACTTTTACGATCAATCTTTTTAATAAGAACTAAATTATAATAAAATCCATATGTTGCCACAGGCGGGAAAAAAAAAAGGAAGGAAAAAAA
+
BBBFFFFFFFFFFBIIIIIIFFIIIIIIIFFIFFF<FBBFFFIIIIIIIIFIIIFFIBFFFIIB<BF<FFFIBFFBBB'7BBB<7<BF7<'077'07BB7<
To validate that all the lines have been copied to the output fastq file Sample01_L001-6_R1.all.fastq.gz
. I counted the number of lines in each fastq file using following command.
$ zcat Sample01_L001_R1.all.fastq.gz | grep '@HISEQ:137' | wc -l
37,955,286
$ zcat Sample01_L006_R1.all.fastq.gz | grep '@HISEQ:137' | wc -l
18,385,272
$ zcat Sample01_L001-6_R1.all.fastq.gz | grep '@HISEQ:137' | wc -l
55,587,340
Expected count should be 56,340,558.
Why the number of fastq header count is different from the expected?
hard to say - do it again, I agree that the counts should be the same, it is possible that the file has been corrupted in some manner
Thanks Istvan Albert. I will try it again.
Have you tried counting all lines and dividing by 4 to see what number you get?
To be safe you can also try this instead:
$ zcat seq1.fq.gz seq2.fq.gz | gzip -c > all.fq.gz
what are those numbers without the grep (all lines) ?
I know this isn't answering your question directly, but you can just supply your fastq files to the aligner, most aligners can merge them for you. I know that STAR does that for sure.