Question

downloading nt databases - cannot extract them all?

1

Entering edit mode

9.7 years ago

balasink ▴ 10

I have to make a local database to BLAST my sequences that I got from next-generation sequencing. I'm trying to create a local nucleotide database - the issue is, the nt database on the NCBI ftp server is in 27 parts.

When I downloaded all the parts, I only managed to have room to extract the first 3 files (nt.00, nt.01, nt.02)..the rest wouldn't complete. I've read the README files and it does say "extract all the files." My desktop has 8GB RAM and room was never an issue. My C folder has 81.5GB free space now and I'm just wondering how anyone managed to download and extract all these files for the complete database...does everyone have a super computer with massive memory?

I have to get this to work as I don't have any other programs (that I can afford) that can do a BLAST for my sequences (each sample having 800+ OTUs).

Help please!

NCBI blast nt BLAST-plus database • 6.7k views

ADD COMMENT • link updated 2.5 years ago by Ram 44k • written 9.7 years ago by balasink ▴ 10

0

Entering edit mode

the rest wouldn't complete

What does that mean, what is the error that you got? There is nothing special about downloading the nt databases. Very straightforward task.

ADD REPLY • link updated 2.5 years ago by Ram 44k • written 9.7 years ago by Istvan Albert 101k

0

Entering edit mode

Hi, so what I meant by complete was...say I downloaded the nt databases and I extracted them all (I found a way to make space). But it wouldn't complete or the .pal file would not stitch all the databases together. So when I tried using the -db nt command..it wouldn't work. I had to type nt.00, but I have the databases up to nt.03 and was hoping the .pal file would stitch them all.

I am not sure if, when I extract the databases after the nt.00 file, they each have a nt.0#.pal file. Am I supposed to replace each .pal file with the newly extracted one? It will not let me extract the .pal file because they already exist (from the nt.00 database). So it's either I replace the .pal file each time, or I delete it.

ADD REPLY • link updated 2.5 years ago by Ram 44k • written 9.7 years ago by balasink ▴ 10

0

Entering edit mode

OK,

Let start from the top. Could you provide links to locations where you downloaded you db from. Next, I would suggest deleting everything and after reserving the required disc space requirements, starting the extraction once again. I heard once before for a similar problem but I cannot remember which file exactly was missing and why so if you could provide links to db locations we can do the extraction together as see if there are some files really missing and why.

mxs

ADD REPLY • link updated 2.5 years ago by Ram 44k • written 9.7 years ago by mxs ▴ 530

0

Entering edit mode

Okay, this is the site: ftp://ftp.ncbi.nlm.nih.gov/blast/db/

I need all the nt (nucleotide) databases. I have the BLAST-2.2.30+ already installed and working. I have enough space to get all 27 of the nt files.

I extract nt.00 first, it works fine. I extract nt.01, but because a nt.pal file already existed after extracting nt.00, the nt.pal file from nt.01 does not get copied or it replaces the first nt.pal file. I am thinking that this is where there is a problem because the nt.pal files keep replacing each other so that it is not being stitched properly.

ADD REPLY • link updated 2.5 years ago by Ram 44k • written 9.7 years ago by balasink ▴ 10

0

Entering edit mode

Ok, *.pal refers to protein alias file which if left form a previous extraction of nr database will cause problems when blast tries to use the nt database (since it will not locate necessary files). What you should see in your database directory is *.nal file if nt database is used. Do not mix those two together and I think you should be fine.

I must admit I never use pre-indexed database what I usually do is I download fasta and format it. that way not mix-ups appear.

I haven't downloaded all the nt.* db fractions so there is a possibility that in one of those accidentally a *.pal was included. Otherwise I see no other solution for pal being there.

ADD REPLY • link updated 2.5 years ago by Ram 44k • written 9.7 years ago by mxs ▴ 530

0

Entering edit mode

Whoops sorry I meant *.nal files.

So if I have a nal file from nt.00, and I extract nt.01 which also has a nal file, do I replace the original one or delete it (it doesn't give me the option to keep both)

ADD REPLY • link updated 2.5 years ago by Ram 44k • written 9.7 years ago by balasink ▴ 10

0

Entering edit mode

You can just comment on my comment don't need to open a new answer post :)

nt.00...

#
# Alias file created 03/11/2015 15:06:33
#
TITLE Nucleotide collection (nt)
DBLIST "nt.00" "nt.01" "nt.02" "nt.03" "nt.04" "nt.05" "nt.06" "nt.07" "nt.08" "nt.09" "nt.10" "nt.11" "nt.12" "nt.13" "nt.14" "nt.15" "nt.16" "nt.17" "nt.18" "nt.19" "nt.20" "nt.21" "nt.22" "nt.23" "nt.24" "nt.25" "nt.26" "nt.27"
NSEQ 28859882
LENGTH 87954442938

nt.01...

#
# Alias file created 03/11/2015 15:06:33
#
TITLE Nucleotide collection (nt)
DBLIST "nt.00" "nt.01" "nt.02" "nt.03" "nt.04" "nt.05" "nt.06" "nt.07" "nt.08" "nt.09" "nt.10" "nt.11" "nt.12" "nt.13" "nt.14" "nt.15" "nt.16" "nt.17" "nt.18" "nt.19" "nt.20" "nt.21" "nt.22" "nt.23" "nt.24" "nt.25" "nt.26" "nt.27"
NSEQ 28859882
LENGTH 87954442938

As you can see they are the same as they should be. replacing the one with the other makes no difference. this is just the alias file that tells the blast how many database fragments there are so it knows many it needs to go through before doing the final statistics and result reporting.

ADD REPLY • link updated 2.5 years ago by Ram 44k • written 9.7 years ago by mxs ▴ 530

0

Entering edit mode

oh thanks! :)

ADD REPLY • link 9.7 years ago by balasink ▴ 10

0

Entering edit mode

no problem :)

ADD REPLY • link 9.7 years ago by mxs ▴ 530