Dear all,
I'm attempting to create a custom BLAST database from a dozen or so whole genomes. For downstream analyses it's necessary to have the taxon ID numbers included in the blast db. This seems like it should be simple enough using the -parse_seqids
and -taxid_map <taxmap.txt>
commands in makeblastdb
, but alas, apparently not.
My fasta headers look like:
>HG380758.1
>HG380759.1
>HG380760.1
...
and my taxid_map.txt file looks like:
HG380758.1 104782
HG380759.1 104782
HG380760.1 104782
...
The command I've run is then:
makeblastdb -dbtype nucl -in in.fna -parse_seqids -taxid_map taxid_map.txt
Unfortunately, this returns the error:
Building a new DB, current time: 07/08/2016 11:53:59
New DB name: in.fna
New DB title: in.fna
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 36167 sequences in 8.00621 seconds.
Error: [makeblastdb] No sequences matched any of the taxids provided.
I've read questions How To Make A Blast Database With Taxonids From Ncbi Query. and Ncbi Blast+ Taxid And Taxid_Map (also http://www.verdantforce.com/2014/12/building-blast-databases-with-taxonomy.html), and basically can't see what I'm doing wrong. I have also tried reformatting the fasta headers to include the taxid, as in >HG380758.1 taxon=104782
, and including the >
seqid prefix in the taxid_map.txt file, but to no avail.
I'm using makeblastdb version 2.3.0+, and I note from previous similar queries the -taxid_map
parameter has not always been functional.
Is this a bug with this version of makeblastdb, or am I still doing something wrong? Any help / workarounds would be much appreciated!
Thank-you for the info and for the workaround. I feared this would be the case! It would be nice if this was made clearer in the makeblastdb help, but hey ho.
I am using 2.2.30+, accession number is supported.