Entering edit mode
3.5 years ago
rebeliscu
▴
60
Something appears to be wrong with one of my fastq files: Blood_ACAGTG_L002_R2_010.fastq.gz
I first noticed an error when trying to trim this file (with its R1 counterpart) with trimmomatic:
java -jar /home/shared/programs/Trimmomatic-0.39/trimmomatic-0.39.jar PE -threads 15 Blood_ACAGTG_L002_R1_010.fastq.gz Blood_ACAGTG_L002_R2_010.fastq.gz /mnt/bdata/shared/SF10711_exome/gbm_14_009_trimmed.fastq.gz ILLUMINACLIP:NexteraPE-PE.fa:2:30:12 LEADING:8 TRAILING:8 SLIDINGWINDOW:4:20 MINLEN:60
java.io.EOFException: Unexpected end of ZLIB input stream
at java.base/java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:245)
at java.base/java.util.zip.InflaterInputStream.read(InflaterInputStream.java:159)
at java.base/java.util.zip.GZIPInputStream.read(GZIPInputStream.java:118)
at org.usadellab.trimmomatic.util.ConcatGZIPInputStream.read(ConcatGZIPInputStream.java:73)
at java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.base/java.io.InputStreamReader.read(InputStreamReader.java:181)
at java.base/java.io.BufferedReader.fill(BufferedReader.java:161)
at java.base/java.io.BufferedReader.readLine(BufferedReader.java:326)
at java.base/java.io.BufferedReader.readLine(BufferedReader.java:392)
at org.usadellab.trimmomatic.fastq.FastqParser.parseOne(FastqParser.java:71)
at org.usadellab.trimmomatic.fastq.FastqParser.next(FastqParser.java:179)
at org.usadellab.trimmomatic.threading.ParserWorker.run(ParserWorker.java:42)
at java.base/java.lang.Thread.run(Thread.java:829)
Exception in thread "Thread-1" java.lang.RuntimeException: java.io.EOFException: Unexpected end of ZLIB input stream
at org.usadellab.trimmomatic.threading.ParserWorker.run(ParserWorker.java:56)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.io.EOFException: Unexpected end of ZLIB input stream
at java.base/java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:245)
at java.base/java.util.zip.InflaterInputStream.read(InflaterInputStream.java:159)
at java.base/java.util.zip.GZIPInputStream.read(GZIPInputStream.java:118)
at org.usadellab.trimmomatic.util.ConcatGZIPInputStream.read(ConcatGZIPInputStream.java:73)
at java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.base/java.io.InputStreamReader.read(InputStreamReader.java:181)
at java.base/java.io.BufferedReader.fill(BufferedReader.java:161)
at java.base/java.io.BufferedReader.readLine(BufferedReader.java:326)
at java.base/java.io.BufferedReader.readLine(BufferedReader.java:392)
at org.usadellab.trimmomatic.fastq.FastqParser.parseOne(FastqParser.java:71)
at org.usadellab.trimmomatic.fastq.FastqParser.next(FastqParser.java:179)
at org.usadellab.trimmomatic.threading.ParserWorker.run(ParserWorker.java:42)
... 1 more
Input Read Pairs: 3860000 Both Surviving: 3102127 (80.37%) Forward Only Surviving: 456443 (11.82%) Reverse Only Surviving: 125247 (3.24%) Dropped: 176183 (4.56%)
TrimmomaticPE: Completed successfully
Looking into this error, I was lead to this thread: Error: Help understand Trimmomatic ZLIB input stream error
Trying to unzip the file, I get an 'unexpected end of file' error.
When I try to view the contents:
zcat Blood_ACAGTG_L002_R2_010.fastq.gz | tail
gzip: Blood_ACAGTG_L002_R2_010.fastq.gz: unexpected end of file
@HWI-D00328:58:H7EAEADXX:2:2215:11524:35696 2:N:0:ACAGTG
ATCTTGCCCTGCCGCACTGACTACGGCTGCTGCCGCCTTTCTATGGCTGTGCGTCTCATCCCCGCTGTCCATCTGGGAGATGGGGTCTTCCTTGTGGCGCC
+
CCCFFFFFHHHGHJJJJBIGJGIIJJJJJJFI9BFFHIJIIGGGIIGEGE;AA?B>CDEEEDD'3=BBCDAFDCDDD2<5?CCBD9<C:@CCDDAC@BD@B
@HWI-D00328:58:H7EAEADXX:2:2215:11723:35707 2:N:0:ACAGTG
TAGATTGTTAGAAAGATCCAAGTATTAAGATCTAGGGTGGCTAACTTTTCACAGACAAAAAGCTTGTTTGTAAGGTCATTTACTATACCCTTAATTCAGGA
+
==+2<@AAB?<A?BBBBB9+3=34>A,>CB4?=AC?9110;AA>ABBBB7*=AA3=>BBB2;>3A76>>BBABAA=7>?@@@>>@>@@B>>=;;?>B=;?3
@HWI-D00328:58:H7EAEADXX:2:2215:11603:35719 2:N:0:ACAGTG
When I do the same for a different fastq, working file, we have:
zcat Blood_ACAGTG_L002_R1_010.fastq.gz | tail
+
@@BFFFFFHHHHHJJJIIIJCHIIJEGHIJGJJGHJJIIJJJJJJJFGHIJJJJJJEHJJJJIJHHHFFFFFE>>>BCDDBCCDDDDDDDDDDDDC9CCDC
@HWI-D00328:58:H7EAEADXX:2:2215:18033:58714 1:N:0:ACAGTG
CTTCTTTCCTTTTAGGTGGTTCTAGATGTTGGTTGTGGATCAGGAATCCTGTCATTTTTTGCTGTACAGGCTGGAGCTAGGACAGTTTATGCAGTTGAAGC
+
@@@FFFFFHFGHH>FG<CFCEDHHCHHGHCGGCGEHCGGGHBFH@?GHDHEDFGIGI@DHHGIJGIG;?ECDBB66;A@>?CC=B@CDC>CD5::AC>>>@
@HWI-D00328:58:H7EAEADXX:2:2215:18170:58720 1:N:0:ACAGTG
GCAAAGTAGTCAGGAATCGATCTCGTGAAGCCCGCAAGGACCGAACACCCCCACCCCGATTTAGACCTACGGGTGCTGCCCCATGTCTCCCACCAAAGCCC
+
?@<DDD2A?=CDF@CGBFGICGIFGF@AE<FFFFIIFFBDFD:AFEEEC4ABDC<@BBBBB?BBBBBBBB9>B9<?@BB?@@B9?(:@@AA?BBB(<39?<
Is there anything obvious the differs between the ends of these two files that can be manually fixed?
Thanks in advance!
You probably have data files that are corrupt. You should re-download them if you can. You could try to fix them but after you fix one error there may be another so likely not worth the hassle.
I agree that this is best fixed by downloading the file again because your file is truncated. The most common reasons for truncations are running out of disk space and broken connection. Also, sequencing centers tend to delete files after a few months, so getting the downloads right is extra important. My recommendations: always download important files on the command-line using scp, aspera, wget, curl or another reliable download manager, depending on the provided internet-protocol. Do check file checksums like md5, sha..., if provided. If in doubt, prefer a sequencing centre that provides convenient methods of data transfer.