I am trying to index nt database. I have used makeblastdb so far and it has worked perfectly. But now when I tried to index my NT database, I got this message - "Adding sequences from FASTA; added 27200239 sequences in 55053 seconds." I did specify output file path, but the result is not there.
./makeblastdb -in <nt path here> -title nt3.2_new -dbtype nucl -out <output path here>/nt3.2New/ -parse_seqids
Above is the command I used(<nt path here> and <output path here> I have the right path but just abbreiviated). I found the similar issue posted here. But the op found where the result file is without any explanation. I am using grep to see if I can find where it is but no luck. Any ideas or suggestions would be highly appreciated. Thanks!
Is there a reason you are making your own when NCBI makes pre-made indexes for NT available? ftp://ftp.ncbi.nih.gov/blast/db/
Yes, I am using filtered NT which means I filter out environmental samples. So I have to index it on my own.
If you have a list of GI's for the env samples you could use the blastdb_aliastool to create the subset blast database.
Not sure if it would be any faster since you already have the filtered NT files available.