FASTQ file error

0

Entering edit mode

8.6 years ago

biolab ★ 1.4k

Dear All,

We are submitting our high throughput sequencing data to NCBI GEO, however, GEO notify us two errors of gzip compressed FASTQ files.

The first error is:

   sample1_R1.fq.gz
   Line number 1922118: File may be truncated

I used perl -ne '$i++; print if $i==1922118' sample1_R1.fq to get that line as follows.

AAAGTTGTTGCAGTTAAAAAGCTCGTAGTTGAACTTCTGTTCAGACTCATAACGACTCGTCGTGTGAAGCTGGACATACGTCTGCAAACTAAAATCGGCA

I can't see it is truncated. What's wrong with this line?

The second error is：

sample1_R2.fq.gz
Line number 1985252: quality length does not match sequence length

I used the above command to get that line, it is as follows.

C@@FFFBBDBFAFIIGGHDFECFAHEHFHGIGADHGEGGH?DF<DF?DB?B?<FFFGHGCHCHFEHIGGFA?B2<?CCDDDDCDCD@>CAC:ACDC:A@A

it is exactly 100 characters that match sequence length, what's the problem?

Would you please to give me some suggestions? I highly appreciate your helps!!

fastq • 2.5k views

ADD COMMENT • link 8.6 years ago by biolab ★ 1.4k

1

Entering edit mode

Do both FASTQ files from sample1 have the same number of reads? Also, the quality string you pasted has 110 bases...

ADD REPLY • link 8.6 years ago by fanli.gcb ▴ 730

0

Entering edit mode

Hi, fanli.gcb, thank you for your reply. I realize the two FASTQ files do not have the same number of lines. This is probably the problem. I will further check. THANKS again!

ADD REPLY • link 8.6 years ago by biolab ★ 1.4k

Login before adding your answer.