I have to make a local database to BLAST my sequences that I got from next-generation sequencing. I'm trying to create a local nucleotide database - the issue is, the nt database on the NCBI ftp server is in 27 parts.
When I downloaded all the parts, I only managed to have room to extract the first 3 files (nt.00, nt.01, nt.02)..the rest wouldn't complete. I've read the README files and it does say "extract all the files." My desktop has 8GB RAM and room was never an issue. My C folder has 81.5GB free space now and I'm just wondering how anyone managed to download and extract all these files for the complete database...does everyone have a super computer with massive memory?
I have to get this to work as I don't have any other programs (that I can afford) that can do a BLAST for my sequences (each sample having 800+ OTUs).
Help please!
What does that mean, what is the error that you got? There is nothing special about downloading the nt databases. Very straightforward task.
Hi, so what I meant by complete was...say I downloaded the nt databases and I extracted them all (I found a way to make space). But it wouldn't complete or the
.pal
file would not stitch all the databases together. So when I tried using the-db nt
command..it wouldn't work. I had to typent.00
, but I have the databases up tont.03
and was hoping the.pal
file would stitch them all.I am not sure if, when I extract the databases after the
nt.00
file, they each have ant.0#.pal
file. Am I supposed to replace each.pal
file with the newly extracted one? It will not let me extract the.pal
file because they already exist (from thent.00
database). So it's either I replace the.pal
file each time, or I delete it.OK,
Let start from the top. Could you provide links to locations where you downloaded you db from. Next, I would suggest deleting everything and after reserving the required disc space requirements, starting the extraction once again. I heard once before for a similar problem but I cannot remember which file exactly was missing and why so if you could provide links to db locations we can do the extraction together as see if there are some files really missing and why.
mxs
Okay, this is the site: ftp://ftp.ncbi.nlm.nih.gov/blast/db/
I need all the nt (nucleotide) databases. I have the BLAST-2.2.30+ already installed and working. I have enough space to get all 27 of the nt files.
I extract
nt.00
first, it works fine. I extractnt.01
, but because ant.pal
file already existed after extractingnt.00
, thent.pal
file fromnt.01
does not get copied or it replaces the firstnt.pal
file. I am thinking that this is where there is a problem because thent.pal
files keep replacing each other so that it is not being stitched properly.Ok,
*.pal
refers to protein alias file which if left form a previous extraction of nr database will cause problems when blast tries to use the nt database (since it will not locate necessary files). What you should see in your database directory is*.nal
file if nt database is used. Do not mix those two together and I think you should be fine.I must admit I never use pre-indexed database what I usually do is I download fasta and format it. that way not mix-ups appear.
I haven't downloaded all the
nt.*
db fractions so there is a possibility that in one of those accidentally a*.pal
was included. Otherwise I see no other solution for pal being there.Whoops sorry I meant
*.nal
files.So if I have a nal file from
nt.00
, and I extractnt.01
which also has a nal file, do I replace the original one or delete it (it doesn't give me the option to keep both)You can just comment on my comment don't need to open a new answer post :)
nt.00...
nt.01...
As you can see they are the same as they should be. replacing the one with the other makes no difference. this is just the alias file that tells the blast how many database fragments there are so it knows many it needs to go through before doing the final statistics and result reporting.
oh thanks! :)
no problem :)