I am trying to do a blastn with my genome assembled on Linux against an organism that only has one public assembled genome. The latter genome comes from the same organism but a different strand. When I ran Mummer, my genome assembly and the one that is on NCBI are quite different. My main goal is to find contaminants in my genome assembly (if there are). The only database that fits me is other_genomic.gz (ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/), but when I try to gunzip it, I have got this:
gzip: other_genomic.gz: unexpected end of file
What does this mean? I should be able to decompress other_genomic.gz.
I hope you could help me out.
Thank you in advance!
It could mean several things: 1) your gunzip is too old (try gunzip -V; it should be 1.3 or higher); 2) if you downloaded the .gz file with a web browser, sometimes they unzip the file on the fly; 3) the file is really incomplete (like it says, unexpected end of file) or was downloaded in a wrong format.
Type file other_genomic.gz and Linux will tell you what type of file you have.Try opening it with a text editor or simply go with more other_genomic.gz. If option 2 is correct, it will look like plain FASTa file. If option 3 is correct, the contents will be garbled. You may need do download the file again using wget.
Thank you so much for your response. I checked your comments:
1) gzip 1.6
2) I downloaded the .gz from a web browser.
When I type file other_genomic.gz I get :
other_genomic.gz: gzip compressed data, last modified: Thu May 23 04:34:35 2019, from Unix
When I try to do more other_genomic.gz I get:
~/CH12_Contaminacion$ more other_genomic.gz
--More--(0%)
When I try md5sum -c other_genomic.gz.md5:
md5sum: other_genomic.gz.md5: No such file or directory
As a general overview of the database is 1 Tera big. When I try to blast on Blast2go, it needs a fasta file. I downloaded the .gz file uncompress it and instead of having a .fasta file I get an executable on Linux.
Note that the file is on the order of 0.3 TB so at an optimal 100 Mb/s it would still take 7-8 h to download. I rarely had a connection stay open for this long. You can resume an interrupted download using the --continue option of wget.
The md5 checksum file has to be downloaded in the same directory as the file with the same name and the md5sum command run in this directory.
What do you mean you get an executable?
You have a proper gunzip and the type of your file is correct, so it seems that your download was incomplete - exactly as the error message indicated. Instead of downloading with your browser, copy the link from the right-click menu, and paste it after the wget command:
wget your_copied_URL
Sometimes the last few KBs in a large file take a while to write to disk, and that may have caused the incomplete download. As long as you wait for wget to finish, gunzip should work afterwards.
As an alternative go gzip, you should be able to use zcat:
Thank you for the reply, but unfortunately, it didn't work.