I am seemingly stuck with something that should be very simple and I hope I haven't overlooked something obvious.
Question: How can I make a valid Blast-database with Taxids from a NCBI query export?
What I have tried so far:
For a meta-genomics project I need a custom made blast database which I wish to generate from the result of the following NCBI Nucleotide query:
Viruses[Organism] AND srcdb_refseq[PROP] NOT cellular organisms [ORGN]
The result is 3986 entries which I exported and saved (via 'Send to') in FASTA and ASN1 format. (Both files are seemingly containing the right amount of entries) As this is a meta-genomics project I would love to have the taxon ids in the blastdb.
I was successful with making a valid blast database from the FASTA file using makeblastdb, but the FASTA header doesn't include taxids, hence I tried to make a blast database from the ASN1 export using the following command (it is not clear from the documentation which formats can be used to create the database):
$ makeblastdb -in AllViralDNARefSeq.asn1 -dbtype nucl -out ViralASN1 -title "All Viral RefSeq DNA from NCBI ASN1"
Building a new DB, current time: 12/20/2011 10:37:28
New DB name: ViralASN1
New DB title: All Viral RefSeq DNA from NCBI ASN1
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1073741824B
Adding sequences from FASTA; **added 10 sequences** in 0.00906897 seconds.
As you can see, this does not work as it adds only 10 sequences.
Any help to get in the Taxonids is appreciated it doesn't have to be elegant, I just need the database from that query. I am using Blast+ 2.2.25
Just notice that it is mentioned nowhere in the manual, that makeblastdb supports anything else than FASTA.
Isn't it similar (without going through all your text) to this question making a BLAST DB alias based on gi's? http://biostar.stackexchange.com/questions/15047/make-a-custom-blast-library-using-the-output-of-another-blast-result/15050#15050
Not the same question, no - they want the taxon ID in the fasta file before formatting the database.