Error with makeblastdb from UniProt fasta file: no sequences were found
0
0
Entering edit mode
5.7 years ago

Hello there

I am trying to create a custom database for BLAST to run a local BLAST with some of my own proetin sequences.

First, I searched for sequences from IDs in UniProt Retrieve/ID mapping tool. Then, I downloaded the output in FASTA (canonical) format.

With this file, I tried to run 'makblastdb' on BLAST 2.8.1+ standalone executable through command line in Windows 10 (ftp version I downloaded is named after 'Windows 7'):

makeblastdb -in prots.fasta -parse_seqids -blastdb_version 5 -dbtype prot -out prots

and I always get the same error:

No volumes were created because no sequences were found

I have tried several modifications on the command line, such as not using '-parse_seqids' or including/excluding '-out' option. Similarly, I have tried several modifications on my FASTA file, such as changing '|' to a lower bar or eliminating spaces (entering a lower bar instead, again) or reducing ID length... and I always got the same error.

What is intriguing me is that the command worked if I only used as input a FASTA file with only one sequence, although I only got two output files, i.e. '.pdb' and '.pdb-lock' files.

Any idea on what is going wrong? How can it be possible to have the problem with more than one sequences in the file but not with only one?

I have searched many different forums and I did not find anything similar...

Thanks a lot

makeblastdb uniprot fasta error blast • 3.7k views
ADD COMMENT
0
Entering edit mode

Is there any chance you could move this off windows and on to unix? If you have Win 10 you could install Windows Subsystem for Linux.

BLAST v.2.8.1 has new functionality to limit blast searches to sequence ID's (if your ID's are standard accessions) or taxID's etc. Is that something you can use?

 -seqidlist <String>
   Restrict search of database to list of SeqIDs
ADD REPLY
0
Entering edit mode

Thanks genomax.

I think the option you suggest is not suitable for my issue, since my problem is in creating the database itself, not in searching within it.

ADD REPLY
0
Entering edit mode

Option I was suggesting will use nt/nr from NCBI itself to limit the search to ID's you specify.

ADD REPLY
0
Entering edit mode

Got it. I will try to do it that way.

However, with this solution, I will skip the creation of my own database, won't I? So, if I have a sequence that is not in NCBI databases (or at least not so similar) I will lose the BLAST query to it, am I right?

I suppose this command will work also on protein databases

On the other hand, I wonder if maybe I have a problem of format in the fasta file... But I cannot find what it is!

ADD REPLY
1
Entering edit mode

It looks like your problem is trying to create v5 database indexes. If you take out -blastdb_version 5 the command seems to work on windows.

ADD REPLY
0
Entering edit mode

Seems to work!

Thanks a lot. I will try to continue with downstream analyses on Genome Workbench and if I have any other issue I will post it here.

Cheers!

ADD REPLY
0
Entering edit mode

There appears to be some strangeness with v. 5 database format. I wonder if there is a specific requirement for fasta headers that is not correctly spelled out. Glad to hear old format works and can be used in your case.

ADD REPLY

Login before adding your answer.

Traffic: 1803 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6