fastq.gz goes into gz cpgz loop
1
0
Entering edit mode
7.2 years ago
S0phia ▴ 10

We used a sequencing service and they gave us a 100GB tar file to download. After downloading, I checked the md5sum code and it matches theirs. But after I unzip the tar file and find fastq.gz files inside a folder, I tried gunzip -c filename.fastq.gz | head, I get "not in gzip format" error. I tried file filename.fastq.gz, it says "data" (not gzip compressed data as I would expect). When I just double click on a fastq.gz file, it goes into gz cpgz loop. Is it possible that they gave us corrupt files?

RNA-Seq • 4.3k views
ADD COMMENT
0
Entering edit mode

What's the output of file xxx.tar and file xxxx.gz.

You can also ask for help from the service provider.

ADD REPLY
0
Entering edit mode

POSIX tar archive (GNU) and data. I've contacted them but no answer so far... Just wanted to ask here to see what else I can try... Thank you.

ADD REPLY
0
Entering edit mode

How did you unzip the tar file and on what OS?

It is possible that you may have corrupted the file during download, in case you did not download it in binary mode. Since md5sum is ok that possibility is slim though.

ADD REPLY
0
Entering edit mode

I just double clicked the tar file on my mac Sierra... maybe I should try unzipping it in other ways. Thank you.

ADD REPLY
0
Entering edit mode

Is the file gzipped at all? Probably file extension is .fastq.gz, but it is simply a fastq file. Try to do head -4 filename.fastq.gz. Also check gzip integrity (gzip -tv <input.gz>) and CRC integrity (gzip -lv <input.gz>). File command output data denotes that File command is not able to determine the content of the file.

ADD REPLY
0
Entering edit mode

I tried to see the content by using head command, and it shows some gibberish (lots of question marks, some numbers and alphabets.) I tried changing the file extension to filename.fastq to see what happens, and it still gives me gibberish. As for the other commands you suggested, I get gzip: filename.fastq.gz: not in gzip format, finename.fastq.gz: NOT OK, and not in gzip format. I guess at this point it's clear that the files I've got are not gzip files even though the name looks like it. Thank you very much for your help.

ADD REPLY
0
Entering edit mode

could you please paste the result of

file xxxx.gz
ADD REPLY
0
Entering edit mode

This is exactly what it says:

filename.fastq.gz: data
ADD REPLY
0
Entering edit mode

Since you are on MacOS, try unarchiver in app store. It is supposed to handle several formats including cpgz. My guess (from googling) that you might have run into the problem explained here: http://osxdaily.com/2013/02/13/open-zip-cpgz-file/. Let us know if any one of the methods works, for future reference.

ADD REPLY
1
Entering edit mode
7.2 years ago
S0phia ▴ 10

Finally. I wanted to leave an update here so others who run into this problem might use this post as a reference. I heard back from the sequencing providers and they had to re-do the fastq files (I'm not sure exactly what they had to re-do, but that's what they told me). The tar file size was half the original, and the fastq.gz files all behave normal (I could simply double click on one and it turned into a readable fastq file, and file command returned gzip compressed data, extra field on all the files). I guess their gunzip process went wrong the first time around. Thank you so much for all your help.

ADD COMMENT

Login before adding your answer.

Traffic: 1632 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6