Entering edit mode
8.6 years ago
biolab
★
1.4k
Dear All,
We are submitting our high throughput sequencing data to NCBI GEO, however, GEO notify us two errors of gzip compressed FASTQ files.
The first error is:
sample1_R1.fq.gz
Line number 1922118: File may be truncated
I used perl -ne '$i++; print if $i==1922118' sample1_R1.fq to get that line as follows.
AAAGTTGTTGCAGTTAAAAAGCTCGTAGTTGAACTTCTGTTCAGACTCATAACGACTCGTCGTGTGAAGCTGGACATACGTCTGCAAACTAAAATCGGCA
I can't see it is truncated. What's wrong with this line?
The second error is:
sample1_R2.fq.gz
Line number 1985252: quality length does not match sequence length
I used the above command to get that line, it is as follows.
C@@FFFBBDBFAFIIGGHDFECFAHEHFHGIGADHGEGGH?DF<DF?DB?B?<FFFGHGCHCHFEHIGGFA?B2<?CCDDDDCDCD@>CAC:ACDC:A@A
it is exactly 100 characters that match sequence length, what's the problem?
Would you please to give me some suggestions? I highly appreciate your helps!!
Do both FASTQ files from sample1 have the same number of reads? Also, the quality string you pasted has 110 bases...
Hi, fanli.gcb, thank you for your reply. I realize the two FASTQ files do not have the same number of lines. This is probably the problem. I will further check. THANKS again!