Process Truncated fastq file
1
0
Entering edit mode
12 months ago

Dear all, I have 150bp paired-end mRNA data, for one sample in the reverse reads (R2) file the QC (FastQC) run for upto 95 % and then failed with an error message:

Failed to process file Sample1-mRNA_R2.fastq.gz
uk.ac.babraham.FastQC.Sequence.SequenceFormatException: Ran out of data in the middle of a fastq entry.  Your file is probably truncated

I have tried to print the tail of the file using the following command:

zcat ISample1-mRNA_R2.fastq.gz | tail -1

and got the following output:

gzip: Sample1-mRNA_R2.fastq.gz: unexpected end of file

AGGCGTATCTCACTGACTTCCTGTGTCAGTTTGCACAGCAGCCCTGCTATGCCATGTTTTCAGACCATCTCAATGAGAATGAAAAGCGAGTGCTGCAGGCCATTGGCAT

The file seems to be truncated but we do not have any other source available as the sequencing was done in 2017 and we only have this version of the file available.

Is there a way to process the truncated fastq file for the differential gene expression analysis?

fastqc fastq • 540 views
ADD COMMENT
1
Entering edit mode
12 months ago
GenoMax 147k

Is there a way to process the truncated fastq file?

Once data is compromised in some way you can't be totally sure of the results. That said you could use repair.sh from BBMap to remove singleton reads and bring the two files in sync. You will lose some data but the remainder can be used.

repair.sh -Xmx4g \
  in1=R1.fastq.gz \
  in2=R2.fastq.gz \
  out1=R1.repaired.fastq.gz \
  out2=R2.repaired.fastq.gz \
  outs=singletons.fastq.gz \
  repair
ADD COMMENT

Login before adding your answer.

Traffic: 2516 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6