Blast an organism against a downloaded database
1
0
Entering edit mode
5.3 years ago
caro-ca ▴ 20

Hi, Biostar community!

I am trying to do a blastn with my genome assembled on Linux against an organism that only has one public assembled genome. The latter genome comes from the same organism but a different strand. When I ran Mummer, my genome assembly and the one that is on NCBI are quite different. My main goal is to find contaminants in my genome assembly (if there are). The only database that fits me is other_genomic.gz (ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/), but when I try to gunzip it, I have got this:

gzip: other_genomic.gz: unexpected end of file

What does this mean? I should be able to decompress other_genomic.gz. I hope you could help me out. Thank you in advance!

BLAST ncbi databases Nanopore • 1.4k views
ADD COMMENT
0
Entering edit mode

As an alternative go gzip, you should be able to use zcat:

zcat other_genomic.gz > other_genomic.fa
ADD REPLY
0
Entering edit mode

Thank you for the reply, but unfortunately, it didn't work.

ADD REPLY
3
Entering edit mode
5.3 years ago
Mensur Dlakic ★ 28k

It could mean several things: 1) your gunzip is too old (try gunzip -V; it should be 1.3 or higher); 2) if you downloaded the .gz file with a web browser, sometimes they unzip the file on the fly; 3) the file is really incomplete (like it says, unexpected end of file) or was downloaded in a wrong format.

Type file other_genomic.gz and Linux will tell you what type of file you have.Try opening it with a text editor or simply go with more other_genomic.gz. If option 2 is correct, it will look like plain FASTa file. If option 3 is correct, the contents will be garbled. You may need do download the file again using wget.

ADD COMMENT
0
Entering edit mode

You can check that the downloaded file is not corrupted by checking the corresponding md5 checksum (run in the same directory as other_genomic.gz):

md5sum -c other_genomic.gz.md5
ADD REPLY
0
Entering edit mode

Thank you so much for your response. I checked your comments: 1) gzip 1.6 2) I downloaded the .gz from a web browser. When I type file other_genomic.gz I get :

other_genomic.gz: gzip compressed data, last modified: Thu May 23 04:34:35 2019, from Unix

When I try to do more other_genomic.gz I get:

~/CH12_Contaminacion$ more other_genomic.gz 
--More--(0%)

When I try md5sum -c other_genomic.gz.md5:

md5sum: other_genomic.gz.md5: No such file or directory

As a general overview of the database is 1 Tera big. When I try to blast on Blast2go, it needs a fasta file. I downloaded the .gz file uncompress it and instead of having a .fasta file I get an executable on Linux.

ADD REPLY
2
Entering edit mode

Note that the file is on the order of 0.3 TB so at an optimal 100 Mb/s it would still take 7-8 h to download. I rarely had a connection stay open for this long. You can resume an interrupted download using the --continue option of wget.

ADD REPLY
1
Entering edit mode

The md5 checksum file has to be downloaded in the same directory as the file with the same name and the md5sum command run in this directory.
What do you mean you get an executable?

ADD REPLY
1
Entering edit mode

You have a proper gunzip and the type of your file is correct, so it seems that your download was incomplete - exactly as the error message indicated. Instead of downloading with your browser, copy the link from the right-click menu, and paste it after the wget command:

wget your_copied_URL

Sometimes the last few KBs in a large file take a while to write to disk, and that may have caused the incomplete download. As long as you wait for wget to finish, gunzip should work afterwards.

ADD REPLY

Login before adding your answer.

Traffic: 1377 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6