Hello there
I am trying to create a custom database for BLAST to run a local BLAST with some of my own proetin sequences.
First, I searched for sequences from IDs in UniProt Retrieve/ID mapping tool. Then, I downloaded the output in FASTA (canonical) format.
With this file, I tried to run 'makblastdb' on BLAST 2.8.1+ standalone executable through command line in Windows 10 (ftp version I downloaded is named after 'Windows 7'):
makeblastdb -in prots.fasta -parse_seqids -blastdb_version 5 -dbtype prot -out prots
and I always get the same error:
No volumes were created because no sequences were found
I have tried several modifications on the command line, such as not using '-parse_seqids' or including/excluding '-out' option. Similarly, I have tried several modifications on my FASTA file, such as changing '|' to a lower bar or eliminating spaces (entering a lower bar instead, again) or reducing ID length... and I always got the same error.
What is intriguing me is that the command worked if I only used as input a FASTA file with only one sequence, although I only got two output files, i.e. '.pdb' and '.pdb-lock' files.
Any idea on what is going wrong? How can it be possible to have the problem with more than one sequences in the file but not with only one?
I have searched many different forums and I did not find anything similar...
Thanks a lot
Is there any chance you could move this off windows and on to unix? If you have Win 10 you could install Windows Subsystem for Linux.
BLAST v.2.8.1 has new functionality to limit blast searches to sequence ID's (if your ID's are standard accessions) or taxID's etc. Is that something you can use?
Thanks genomax.
I think the option you suggest is not suitable for my issue, since my problem is in creating the database itself, not in searching within it.
Option I was suggesting will use
nt/nr
from NCBI itself to limit the search to ID's you specify.Got it. I will try to do it that way.
However, with this solution, I will skip the creation of my own database, won't I? So, if I have a sequence that is not in NCBI databases (or at least not so similar) I will lose the BLAST query to it, am I right?
I suppose this command will work also on protein databases
On the other hand, I wonder if maybe I have a problem of format in the fasta file... But I cannot find what it is!
It looks like your problem is trying to create v5 database indexes. If you take out
-blastdb_version 5
the command seems to work on windows.Seems to work!
Thanks a lot. I will try to continue with downstream analyses on Genome Workbench and if I have any other issue I will post it here.
Cheers!
There appears to be some strangeness with v. 5 database format. I wonder if there is a specific requirement for fasta headers that is not correctly spelled out. Glad to hear old format works and can be used in your case.